Construction Documents are the backbone of the construction industry. They are the architect’s main product and an indispensable tool used throughout the entire lifecycle of the project. But nobody enjoys reading the CDs. They are constantly scrutinized, rarely celebrated, and never publicized.
Go to any architect’s website or social media, what do you see? Blueprints? No, its photos of finished buildings. Sure that is what we are selling, but it’s not what we sell.
We sell very expensive PDFs.
But in all honesty these are exceptional PDFs. The drawings and specs, created after years of work and collaboration, with combined centuries of expertise sprinkled all over it. A technical work of art in a legal contract.
Construction Documents are intricate and complex instruction manuals for how to build a building. And they are difficult and slow to read.
The drawings are densely packed technical information in images that explain the design intent. The specifications are an endless encyclopedia of requirements about every material and product on the job. Perfect bedtime reading.
CDs are at the core of the construction industry and where I believe AI will have the greatest impact. AI is incredibly efficient at reading and understanding vast amounts of data, and CDs are exactly that. A vast amount of unstructured information stuffed into two PDFs.
But remember these are very expensive, and more importantly, one-of-a-kind proprietary PDFs. The security of the documents and client’s data is a main priority for all architects. So how do we leverage the efficiencies of AI to read these documents faster, without giving the AI companies our expensive proprietary PDFs?
That is the problem I’ve been trying to solve in my free time this past year. Thanks to the explosion of new AI tools, techniques and tutorials, I have found a way to create an AI chatbot for CDs without giving the AI access to the CDs.
In this article I will explain the steps and techniques I took to build the chatbot. I’ll show what I’ve learned, how the chatbot works, and what steps the code is executing on the backend. This is not a coding tutorial and I won’t get too technical on the AI concepts. Instead I will walk through what needs to be set up beforehand and what happens each time a question is asked.
I hope this article will give you a clear idea of how chatbots for specific information work, explain how the questions are answered, and show it is not as complicated as it may seem.
This is personal studies unrelated to my professional job. Everything posted in this article is my personal opinion and does not necessarily represent the views of any other person or party.
AI Chatbot Demo
To kick things off, let’s take a look at a demo video showcasing the AI chatbot in action:
The AI chatbot answers the question and the PDF viewer changes to the sheet that is most relevant to the question.
This chatbot can only answer question about these construction documents. And it can achieve this without having any access to the actual drawings.
The Process
To give you a better understanding of how the AI chatbot works, let’s break down the process diagram into its key components:
- Indexing – Pull the information out of the drawings so the AI can understand it
- Retrieval-Augmented Generation (RAG) AI – Technique to make large language models (LLM) answer questions only about specific information
Lets dive into each of these components in detail, explaining how they work and how they contribute to the overall functionality of the chatbot.
Indexing
Manually pulling the information out of the CDs so AI can understand it and access it.
Converting Construction Documents to Text Files
The AI chatbot I built does not understand the drawings or even sees them. The required programming knowledge to train the AI to understand that is over my head. Instead I manually converted the CDs into a format the AI can easily process. I simply explained the drawing in a text file. Using my knowledge of CDs I can quickly explain drawings in written form.
I just looked at a detail then wrote a structured paragraph explaining it. Then I did that for every view on the sheet. Then every sheet in the set.
Nothing special or automated here because I didn’t know how. Better programmer’s or better LLMs can make this step more automated (Like OpenAI’s most recent LLM ChatGPT 4o).
This is a critical step because it transforms unstructured data (drawings) into structured text that can be further processed by the AI. And it’s extra critical because the AI will only have access to what is written in the text, and will only answer questions it can find the answers to in these text files. So its imperative to explain the drawings extensively and correctly.
Creating Embeddings
Once we have text files, next we need to create embeddings for each file. Embeddings are complex numerical representations of words and phrases. Simply its 1524 numbers from 0-1 grouped together that capture the meaning of the words in a way the AI understands. This is the magic of the LLMs.
I used OpenAI’s API text-embedding-ada-002 model to create the embeddings for each text file. This step is fairly easy, just an API call, but I believe I can make improvements on how I write the text files to create easier to search embeddings. A task for another day…
Upserting to a Vector Database
After creating embeddings for each text file, we need to store them in a vector database. A vector database is a specialized database designed for embeddings and RAG. The process of upserting involves inserting new embeddings into the database while updating existing ones. This ensures that our database remains up-to-date with the latest information from the construction documents.
The vector database I used is called Pinecone. This is a paid service designed specifically for AI chatbots and RAG. Using a paid service greatly simplified the work for myself and also provides a high level of security. This service stores all the text files explaining the CDs so keeping this secure is important.
Indexing Summary
The three steps detailed above is preliminary backend work required to build the chatbot. All this needs to be done before the chatbot can be functional. It can be updated and changed to create better AI responses and cover changes in the CDs like revisions.
The LLM will have access to all the text files created but not the drawings themselves. Is this separation enough to ensure security of the drawings and client data are not shared with AI? That’s a pressing question to our industry we will need to answer.
Now that all the pre-work is done. Let’s look into what actually happens each time a user asks the chatbot a question.
Retrieval-Augmented Generation (RAG)
Overview of RAG
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the strengths of retrieval-based and generation-based models. In the context of this AI chatbot, RAG enhances the chatbot’s ability to provide accurate and relevant answers by leveraging both retrieved documents and generative language models.
Workflow For Each Question
Here’s how the RAG process works in the AI chatbot:
- Question Parsing: The chatbot receives a question from the user and then parses it to understand the intent.
- Retrieval: The chatbot retrieves relevant documents from the vector database based on the question. Basically a fancy CNTL F.
- Combining Context: The chatbot takes the relevant text files and combines them into a single document.
- LLM Answer Generation: Using the retrieved documents as context, the chatbot generates a precise and accurate answer using a Large Language Model (LLM).
This workflow ensures that the chatbot provides responses that are both contextually relevant and accurate. It doesn’t allow the chatbot to create hallucinations. This creates a better user experience than using ChatGPT by itself, and is significantly easier than searching the documents manually.
RAG Example
Now, let’s look at an example of how RAG works in practice. Imagine a user asks the chatbot, “What is the wall paint color in the kitchen?” Here’s how the RAG process handles this query:
- Question Parsing: The AI parses the text to understand its structure and intent. This involves identifying key elements such as the subject (“paint color”), the context (“kitchen”) and the type of question being asked.
- Retrieval: The chatbot searches the vector database for documents related to the kitchen’s paint color and finds the 5 most relevant files. This may be: The finish schedule, the kitchen interior elevations, and the kitchen finish plan.
- Combining context: The top 5 results retrieved in the previous step are combined into a single page.
- LLM Answer Generation: The original user question is sent to a LLM with the combined page of context. The gpt-3.5-turbo-instruct LLM is prompted to answer the question based on the context information. If it can’t find the answer there, it must respond saying it doesn’t know the answer.
Conclusion
In this article, I’ve attempted to peel back the cover of the development of an AI chatbot for construction documents. I’ve covered the key technologies involved LLMs, RAG and vector databases. These advanced techniques were created by people much smarter than myself, but these tools are fairly accessible for non coders (don’t forget ChatGPT can help you understand and learn anything).
Utilizing AI with CDs will create many efficiencies and insights, but there is a serious concern with sharing proprietary data. Does the separation of drawings and AI explained in this article provide enough security? Thats a question I’m still working through and will continue to iterate on.
I hope this article has taught you how these AI chatbots work and shown you that the steps required are not as complex as they seem. If you have any questions or feedback, feel free to leave a comment.