By Oscar Frith-Macdonald, 13 May 2026
In our previous articles we have looked at what AI is, how semantic search works, how AI features like embeddings are showing up in FileMaker, how to generate a response from a model, and most recently the "Perform SQL Query by Natural Language" script step. If you've been following along, you may have noticed we're building towards something practical…
A way to literally talk to your data!
We want to build a chatbot that we can have a natural conversation with about a specific client or project. Not a generic AI that knows a bit about everything, but something that knows our data: meeting notes, project history, correspondence, and anything else we know about that client. Something we can ask questions of and get useful answers back.
The technology that makes this possible is called RAG, Retrieval-Augmented Generation. In this article we're going to look at what RAG actually is, how it works, what we're trying to build with it, and the three main ways we see that you could approach it.
RAG stands for Retrieval-Augmented Generation, and it’s the core idea behind any system that lets you chat with your own data.
To understand why it matters, it helps to think about the problem it solves. Language models are trained on enormous amounts of text, but that training has a cutoff date, and more importantly it doesn't include your data. If you ask a language model about a meeting you had with a client last Tuesday, it has no built-in knowledge of that at all... and nor should it!
This is where AI systems can start to get into trouble. Newer models are better at admitting when they don't have enough information, but they can still guess, fill in gaps, or sound more certain than they should. Without access to the real client data, that's exactly the kind of situation where you can end up with a plausible-sounding answer that isn't actually true.
One obvious way around this is to just include all your data in the conversation. Dump all your client docs into the prompt and ask the model to answer based on them. That gives the conversation Context.
We do not recommend sending your raw data to an AI...
(Or any other API type service for that matter — your accounting system might be the exception.)
If data privacy matters, that usually means using a paid AI model rather than a free one. Even then, it is worth checking the settings carefully to make sure your data is not being retained or used for training. That can improve things, but it still does not give you full control. If you want the highest level of security and ownership over your client data, a local setup is really the only option.
RAG is the smarter solution, instead of throwing everything at the model, you first retrieve only the most relevant pieces of information, then add just those to the prompt before asking the model to respond. It's more targeted, more efficient, and helps ensure the answers generated are based on real data. That's what reduces the risk of hallucinations.
The "space" where your embedded documents live, is what we are calling a RAG space. OpenAI calls theirs Vector Stores; other providers use different terminology, but the concept is the same.
RAG works in two stages:

The retrieval stage is essentially a semantic search. If you have read our Diving Into Semantic Search article, a lot of this will be familiar. If not we encourage you to read that article for more detail.
The first step is to embed all of your documents using an embedding model to get a vector for each one. These vectors are what is actually stored in your RAG space. You could think of the vectors as a sort of index based on meaning.
When a user asks a question, that question is embedded using the same model, producing its own vector. The system then compares the question's vector against every document vector in your RAG space using cosine similarity. The documents with the highest cosine similarity scores are the ones retrieved and passed to the language model.
It's also worth noting that retrieval quality is heavily influenced by how you chunk and structure your documents before embedding them. Smaller, focused chunks tend to produce more precise results, because the vectors are based on smaller chunks of text, and therefore the meaning the vector represents is more precise.
Large monolithic documents can dilute the semantic signal and make it harder to find the right content. We touched on that in our semantic search article, and the same principles apply here.
Now that we understand what RAG is, let's talk about what we are actually trying to build.
The goal is a chatbot that knows about a specific client: their meeting notes, project history, correspondence, and anything else relevant. This is where a lot of the earlier pieces start to come together: embeddings, semantic search, and generating responses from models all in one workflow. The model never needs to be trained on any of this data. Instead, each time a user asks a question, the most relevant documents are fetched from that client's RAG space and placed in front of the model in real time.
The key design decision is that we want a separate RAG space for each client. This way when a user opens a chat for Client A, the system retrieves from Client A's space only. This keeps responses focused and avoids the risk of information from one client bleeding into answers about another.
So the question becomes: what's the best system to build this on? We will be looking at 3 different options:
As we investigate each option we will post accompanying articles.