Introduction to RAG Systems
Retrieval Augmented Generation systems, or RAG systems, have gained significant popularity for building Generative AI assistants using custom enterprise data. They offer a practical alternative to the costly fine – tuning of Large Language Models (LLMs). One of their key strengths is the ease of integrating custom data, enhancing the intelligence of the LLM, and providing more context – aware answers to user questions.
The Drawbacks of Traditional RAG Systems
Despite their advantages, traditional RAG systems are not without issues. They can underperform and, in some cases, provide incorrect answers. Some of the limitations include a lack of real – time data access, the potential for retrieving irrelevant documents, and the system’s performance being limited by the data in the vector database. Additionally, the LLM used in the system may be prone to hallucinations or unable to answer certain questions.
The Corrective RAG System Concept
The corrective RAG system, inspired by the paper “Corrective Retrieval Augmented Generation” by Yan et al., aims to address these limitations. The core idea is to first retrieve document chunks from the vector database as in a standard RAG system. Then, an LLM is used to assess the relevance of each retrieved document chunk to the input question. If all chunks are relevant, the system proceeds with normal response generation. However, if some are not relevant, the input query is rephrased, a web search is conducted to obtain new information, and this is then sent to the LLM for response generation.
The Rise of AI Agents
AI Agents have seen a surge in popularity, especially in 2024. These systems enable the creation of Generative AI that can reason, analyze, interact, and act autonomously. Agentic AI systems are designed to be highly autonomous, handling complex workflows with minimal human intervention. Various frameworks, such as CrewAI, LangChain, LangGraph, and AutoGen, can be used to build these systems. For the implementation of the Agentic RAG system in this guide, LangGraph, which is built on top of LangChain, is utilized. LangGraph facilitates the creation of cyclical graphs for AI agents powered by LLMs.
Agentic Corrective RAG System Workflow
The Agentic RAG system has a two – flow workflow. The first is the regular RAG system workflow, where a user question is received, and context documents are retrieved from the vector database. An additional step is introduced where an LLM checks the relevance of the retrieved documents. If all are relevant, an LLM generates a response. The second flow is triggered when at least one retrieved document is irrelevant. In this case, an LLM is used to rephrase the query, a web search is performed using the rephrased query, and the new information along with the original query is sent to an LLM for response generation.
Detailed Architecture of the Agentic Corrective RAG System
The system starts with a user query that goes to the vector database (Chroma in this implementation). Retrieved context documents are then sent to an LLM acting as a document grader. Based on the grader’s output, the system either follows the standard RAG flow if all documents are relevant or rephrases the query and conducts a web search (using the Tavily Web Search API) if any documents are irrelevant or no documents are retrieved. Finally, the query and relevant context documents, including web – retrieved information, are used for response generation.
Hands – on Implementation with LangGraph
The implementation process begins with installing necessary dependencies such as langchain, langgraph, and others. API keys for OpenAI and Tavily are entered securely, and environment variables are set up. A vector database is built using a subset of Wikipedia data, with the help of OpenAI embedding models. Various components such as a vector database retriever, a query retrieval grader, a QA RAG chain, a query rephraser, and a web search tool are created. Key functions for the Agentic RAG system, including retrieve, grade_documents, rewrite_query, web_search, generate_answer, and decide_to_generate, are defined. These functions are then used to build an agent graph with LangGraph, which is compiled and tested on different user queries.
Conclusion
This guide has provided a comprehensive look at the challenges in traditional RAG systems, the role of AI Agents, and how an Agentic RAG system can overcome some of these challenges. Through a detailed architecture, workflow, and hands – on implementation, the potential of Agentic RAG systems has been demonstrated. Users are encouraged to explore further and improve the system, for example, by adding more hallucination checks.