While LLMs dominate the headlines of today's tech news, those who have tried to use them to create a data-centre solution have encountered the inevitable challenge that is hallucinations.
Retrieval Augmented Generation, more commonly known as RAG, is widely considered today’s best solution to this problem but it’s less common to hear about how RAG is actually achieved in practice, the details of different methods, and their pros and cons.
At the highest level, RAG is a method of improving the accuracy of LLMs by grounding their answers in data. Instead of asking an LLM questions directly we first ask it to retrieve data from a trusted data source, then to generate an answer based exclusively on the data that was returned.
The reality is that RAG is a broad term and while there are many ways to implement it, not all are equal. Here we’ll discuss 3 of the most common approaches.
Please note that these are practical but still relatively high-level approaches and optimisations can be implemented in each case.
Need more context before diving in? Read about RAG and why we need it in our previous article:
Two Truths and a Lie with ChatGPT: The critical difference between RAG and an LLM alone
The last thing to mention before discussing the different implementations of RAG is the end goal—the reason why.
Among others, the primary reasons for using LLMs include their accessibility and creativity. They allow users to ask potentially complex questions of systems with ease, lowering the barrier for entry while raising the value of what they can extract.
As we discussed in our last article, some of this value is lost when using a traditional RAG approach as it forces the user down a narrow set of predefines paths, negating the magic-like flexibility of LLMs which is often the reason to use them in the first place. Thankfully, that’s not where the story ends.
Rules-based AI, or Semantic Reasoning, is a cousin of the LLM from a rather different branch of the AI family tree. Rather than statistics, rules-based AI is based in logic. In many ways this makes it the opposite of, and perfect companion to, generative AI. Rules-based AI is used to enrich a database by automatically inferring new information based on rules written by domain experts. Rules can be as simple as or as complex as the use case requires, enabling everything from simple ontological reasoning to complex analysis with functionality including negation, aggregation, recursion and both incremental addition and retraction. By combining knowledge and context with data, it creates a system that is directly queryable for complex insights and answers, all while being explainable and guaranteeing accurate results.
With a solution that combines RAG and rules-based AI we can created a system that maintains both the benefits offered by LLMs and the accuracy required for production systems.
For the following implementations we’ll be using examples created with RDFox, the rules-based AI and Knowledge Graph Database.
Each of these techniques starts with entity extraction and alignment. First, we extract the keywords from the input (basic NLP can suffice) and match them to our closed known terms—that is, the entities in our knowledge base that best represent the extracted entities. This step is typically achieved with a vector database (we used ChromaDB) and some form of similarity analysis (a simple example being cosine similarity). Once we have the resolved these entities, we’re ready to begin.
As well as starting the same, all these techniques also share the same final step. Once data has been retrieved from the database, here the graph database RDFox, the LLM is asked to answer the user’s initial question using only the data that has been provided, thus avoiding hallucination and ensuring a truthful, accurate answer according to the knowledge base. In each case below, we have used OpenAI’s LLMs and LangChain to manage their usage.
So, now just to fill in the middle bit...
The first and simplest method is query generation—RDFox uses SPARQL but the same could be done for SQL or any other query language.
Rather than asking a question directly to an LLM, we instead ask it to generate a query in our desired language, using the resolved terms, that would give it the information to best answer the user’s question.
For the LLM to generate a meaningful query, we must also provide it the relevant part of our schema—what our data looks like—in order for it to use the exact terms our database contains.
With rules-based AI, we can enrich the data with advanced concepts of valuable insights that can be queried for directly, granting the LLM access to deeper knowledge while aligning answers with the knowledge base.
We can then run this query against our database and serve the results back to the LLM, asking it to use them to answer the user’s initial question.
There are several ways to improve this method such as providing the LLM with examples of hand-written queries or using an LLM specifically trained to generate queries of a certain type.
However, in practice, we found this method to be flawed. Either: we used a fast LLM (such as GPT 3.5 Turbo) for which this level of freedom proved to be too much, frequently leading to hallucinations and nonsensical queries; or, we used a more powerful LLM (such as GPT 4) which managed to reliably write SPARQL but would take 10s of seconds to complete—a time far too long for any real user to wait.
To address the challenges encountered above, we can instead implement function calling. The trade-off this time is a reduction in flexibility and scope but, with some careful thinking, this can be minimised.
Instead of asking it to generate a query, we ask the LLM to choose a predefined function from a set of several, each of which generates a specialised query.
These functions programmatically generate a particular query, allowing the LLM to add entities in a structured manner where it deems them appropriate. With proper planning, even a relatively small set of functions can cover a significant proportion of useful user-questions.
As before, the query is run against the knowledge base and the results used to generate a natural language answer.
Aside from being faster, this drastically reduces the opportunity for hallucinations which we have found to be a restricted but functional middle ground.
Whereas everything else we have discussed here can be achieved with any database, the final method requires the use of a knowledge graph.
A knowledge graph database stores data as nodes and edges in an interconnected network. They are grouped in node-edge-node configurations called triples, or facts, as they represent a unit of information in the database.
Here you can see several facts that share a common node. The nodes and edges that surround the central node are said to be in its vicinity and together they form a graph cluster (also known as a fragment or shard). In this case, the cluster only and exhaustively includes nodes that are one edge away from the central node, but this can expanded depending on the data shape and requirements.
This time, the LLM isn’t asked to generate a query or to choose a function, instead, we first execute a standard database query that returns the graph clusters for each of the resolved entities. The LLM is then presented with this raw data and instructed to use it as context to answer the user’s initial question.
This method removed all restrictions on the creativity of the LLM without relying on the difficult task of pure query-generation and ensures a much higher level of accuracy as the answers are all routed in relevant data.
However, there is still a potential challenge. If the data is very dense, that is to say that the clusters contain large amounts of data for any given node, then the results returned by the query may overwhelm the LLM. This is known as ‘context poisoning’ and can result in crucial information being ignored or irrelevant information misguiding the answer. In extreme cases the clusters can be so large that they breach limits on the context window for the LLM, so critical information may never be considered at all.
Overall, this is a very promising technique under the right conditions.
Now that you have the foundations and a few tools at your disposal, the first step in building a RAG application is to understand your data, your problem, and your goal. With those things clear you can do two things: choose the RAG approach that best suits your needs and enrich your data with rules-based AI so that your system can answer the most valuable questions. By combining the two technologies, it’s now possible to create powerful, practical applications that offer the cutting-edge features of AI and stand up to real-world use.
Have a project in mind? Book a demo with us today or try RDFox for free!
Looking to learn more about LLMs? Read our article about ChatGPT's Snow White Problem or watch this video from Prof. Ian Horrocks about how LLMs work and how to make them more useful for business:
The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data-intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin-out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Enterprises (OSE) and Oxford University Innovation (OUI).