Techniques for Automated Source Citation Verification for RAG

Techniques for Automated Source Citation Verification for RAG

Techniques for Automated Source Citation Verification for RAG

Techniques for Automated Source Citation Verification for RAG
Techniques for Automated Source Citation Verification for RAG
Techniques for Automated Source Citation Verification for RAG

Over the last year, retrieval augmented generation (RAG) has emerged as a popular LLM-based architecture to address one of the most common challenges of using LLMs in knowledge-based processes, specifically the dreaded hallucinations.

How does RAG address hallucinations?

A RAG workflow conceptually consists of two components — (1) a retrieval step that starts with selecting and filtering source materials provided by the user, and (2) an LLM inference in which the source materials are packaged with the query, with the instruction to the LLM to use the source materials to answer the question. This two-step process is especially effective in enterprise use cases where there is a need to draw on private, contextual knowledge, such as a set of contracts, financial analyses, or customer support data.

Evidence Verification

There is a crucial third-step in a RAG workflow, which is gaining increasing attention recently, which is what happens after the LLM transaction, specifically: with potentially a large-scale of LLM transactions, how can you systematically verify that the LLM response corresponds to the specific sources with citations? Ultimately, this is the real pay-off of a RAG system, but requires an integrated pipeline that captures various inputs leading up to the LLM prompt, and then integrates all of the LLM inference data to enable post-processing, source verification, ongoing review and audit, and creating a cycle of continuous improvement.

In this blog, we are going to demonstrate a series of powerful techniques to provide source verification and automated source citations as a third step post processing in a RAG workflow. We will use llmware, a leading open source framework for developing evidence-based LLM applications, to build a RAG workflow that provides a basic contract analysis and then verifies the accuracy of the LLM response compared with the source passages from the contracts.

LLMware provides several out-of-the-box tools to verify sources and evidence in a RAG workflow with simple, intuitive methods in the Prompt class for the following:

  • evidence_check_numbers — reviews a set of prompt response objects and validates whether numbers in the llm_response are verified in the source materials that were provided;

  • evidence_check_sources — reviews the llm_response and the entirety of the evidence to “pin-point” the most likely snippets of text, their document and page number that constitute a “source” bibliography for the LLM response;

  • evidence_comparison_stats — provides a rapid token comparison to highlight the overall match of the response with tokens in the evidence, as well as confirmed and unconfirmed tokens;

  • classify_not_found_response — provides three distinct functions to assess whether the llm_response likely can be classified as a “not found” response such that it can be properly dispositioned (including discarded) in the workflow; and

  • save_state — in many ways, this is the powerful device, as the prompt state is capturing all of the facets of the llm transaction through the entire pipeline, and potentially a series of transactions, and saving it to a nicely packaged jsonl dictionary for offline analysis, or for easily inserting into a document datastore. The state can also be used to quickly generate a fine-tuning dataset.

Getting Started

To get started, you will need to install llmware, which can be handled in a standard python import with:

Once you have installed llmware, we would recommend that you use the Setup() command to pull down a set of hundreds of useful sample documents that are packaged in the llmware public repo:


For this demonstration, we are going to use a few sample executive employment agreements from the “Agreements” folder, but please feel free to use your own documents if you prefer.

Once you have your sample contract documents, we are ready to go with building a RAG workflow.

Let’s look at each of the components of the code, and then put all of the pieces together.

Step 1 — Create a Prompt object — in llmware, Prompt is the main class that handles the end-to-end prompting interaction, and all of the steps of the RAG process can be handled inside a specific prompt. The Prompt will handle loading the model, packaging and filtering the source materials, applying prompt instructions, and the post processing lifecycle.

In the first step, all we are going to do is instantiate a prompt object, and attach our selected model.

In this demo, we will use “gpt-4and you can either pass your api_key directly, or load it into an os.environ variable:

(By the way, we would encourage you to experiment with other models, including Huggingface based open source LLM models, as outlined in a separate tutorial. One of the powers of llmware is the ability to quickly swap-in another model by simply changing the model string name, and passing in its api key.)

Step 2 — Create Source materials — this is the critical step in the RAG workflow, where a lot of the magic happens. In this one deceptively simple line of code, we will point the “add_source_document” method to our contracts folder and specific contract, and then this method will figure out the document type, extract the content by parsing the document-specific format (e.g., PDF or Word Document), chunk the text, and then it will take a really cool step of applying an optional filter, and then batching and packaging as a “context” that is ready for inference so we don’t have to think about it again or get into all of the data manipulation — all of that is handled behind the scenes of this powerful method call. Llmware makes this step a breeze with a simple one-liner:

In this case, we will parse the contract in the selected folder, regardless of the file type, then we will further filter based on a topic of “base salary” and package as a source with metadata to be included in the prompt to the LLM. (We would encourage you to take a look at the output source object, which includes a lot of great metadata and a view of the packaging of the context.)

Step 3 — Run inference — this is the main processing step in which the source materials prepared in the previous step are loaded into a prompt, with the query, the prompt instruction and the temperature setting to call the LLM and get an LLM response.

The output of an inference is a standard responses dictionary, which includes one or more llm responses, depending upon the size of the input context, it will automatically chunk the context into batches and run several inferences, if needed.

Step 4 — Source Checks — this is the last main step, which is the post processing of the responses dictionary, after the prompt is completed. As mentioned above, there are several source and automated evidence checking tools provided:


We will take a look at the output of each of these methods once we run the code.

Putting it all together, here is the complete code that we will run (which also can be found in its entirety in the llmware repo):

from llmware.prompts import Prompt
from llmware.configs import LLMWareConfig
import os

os.environ["USER_MANAGED_OPENAI_API_KEY"]


Run the script and in the console, and you will quickly see a series of outputs corresponding to the analysis of each contract document with representative output for each document similar to the following:

Analyzing Contract -  1 Nyx EXECUTIVE EMPLOYMENT AGREEMENT.docx

Question:  What is the executive's base salary?
LLM Response:  $200,000.

Numbers:  [{'fact': '$200,000,', 'status': 'Confirmed', 'text': ' ... pay Executive a base salary at the annual rate of $200,000, payable semimonthly in accordance with Employer's normal payroll practices.  ... ', 'page_num': '3', 'source': 'Nyx EXECUTIVE EMPLOYMENT AGREEMENT.docx'}]
Sources:  [{'text': 'pay Executive a base salary at the annual rate of $200000 payable semimonthly in accordance with Employer's normal payroll practices ', 'match_score': 1.0, 'source': 'Nyx EXECUTIVE EMPLOYMENT AGREEMENT.docx', 'page_num': '4'}]
Stats:  {'percent_display': '100.0%', 'confirmed_words': ['200000'], 'unconfirmed_words': [], 'verified_token_match_ratio': 1.0, 'key_point_list': [{'key_point': '$200,000.', 'entry': 0, 'verified_match': 1.0}]}
Not Found Check:  {'parse_llm_response': False, 'evidence_match': False, 'not_found_classification': False}Numbers:  [{'fact': '$200,000,', 'status': 'Confirmed', 'text': ' ... pay Executive a base salary at the annual rate of $200,000, payable semimonthly in accordance with Employer’s normal payroll practices.  ... ', 'page_num': '3', 'source': 'Nyx EXECUTIVE EMPLOYMENT AGREEMENT.docx'}]

Let’s explain each of the fact checks in more detail.

Numbers – this will review the LLM response, identify any numbers in the response, and then look to find a matching numerical value in the source materials, and if provided, then a dictionary with confirmed facts is provided that shows the fact, the status, and a snippet of text with source and page num that provides the confirmation. Basic regex handling (e.g., removing ‘$’, commas, etc) as well as float value comparison is provided (e.g., 12.00 is the same as 12) to enhance the robustness of the check.

Sources – this method reviews all of the sources provided in the context, and provides a statistically-derived review based on density of matching tokens to identify most likely specific sources for the llm output, including the source document and page number.

Stats – this is a really helpful quick check and one of the most reliable simple ways to identify a potentially problematic response – it shows the percent match by tokens (excluding stop words and basic formatting items), as well as showing a list of key confirmed and unconfirmed tokens.

Not Found Classification – in our experience, this is one of the most important, and least understood, checks in RAG processing. In almost any RAG automation, there will be many occurrences in which a particular passage is included in a context, and the passage is not sufficient to answer the target question or analysis. These are often times the highest risk for errors, as the model tries to be “helpful” by incorporating information from the context to create an answer, even though, the context does not specifically apply. Also, the models can often times be verbose in responding to these questions with lengthy explanations of why it can not provide an answer, when the most useful output for a workflow is a classification of “not found” so that a particular inference transaction can be excluded from further analysis. In this case “False” indicates the double negative, e.g., that the transaction is “not” a “not found” transaction.

Prompt State – finally, all of the llm response objects, enriched with the fact check outputs, are saved in the prompt state history at the link at the bottom of the console output. We would encourage you to review this .jsonl file, which gives a complete view of all of the metadata for a particular LLM transaction, which can be used for several purposes:

  • Analytics on overall accuracy and rates of errors / hallucinations;

  • Comparisons of different models;

  • Audit and compliance activities; and

  • Continuous improvement to identify common issues.

The combination of these five mechanisms – numbers, sources, stats, not-found, and prompt state – provide a powerful toolkit for just about any RAG workflow to quickly identify potential errors and risk exposures.

We would encourage you to check out the llmware-ai github repo for this and other useful code samples.

For more information about BLING, please check us out on Huggingface: https://www.huggingface.co/llmware/.

For more information about llmware, please check out our main github repo: https://www.github.com/llmware-ai/llmware/.

Please also check out video tutorials at: youtube.com/@llmware

Article by

Darren Oberst
Darren Oberst

Darren Oberst

CEO and Founder

Published on

Aug 31, 2023

Other Articles by

Darren Oberst

It's time to join the thousands of developers and innovators on LLMWare.ai

It's time to join the thousands of developers and innovators on LLMWare.ai

It's time to join the thousands of developers and innovators on LLMWare.ai