Understanding RAG: A Comprehensive Review of “Retrieval Enhanced Generation of Large Language Models” by Tongji University
Introduction
Large Language Models (LLMs) have shown remarkable capabilities, but they still face challenges in practical applications, such as hallucination, slow knowledge updates, and lack of transparency in answers. Retrieval-Augmented Generation (RAG) refers to the retrieval of relevant information from an external knowledge base before using LLMs to answer questions. RAG has been proven to significantly improve the accuracy of answers, reduce the hallucination phenomenon of models, especially for knowledge-intensive tasks. By citing sources, users can verify the accuracy of answers and increase trust in model output. It also promotes knowledge updates and the introduction of specific domain knowledge. RAG effectively combines the parameterized knowledge of LLMs and the non-parameterized external knowledge base, making it one of the most important methods for implementing large language models.
https://arxiv.org/abs/2312.10997
Key insight
GPTsterser‘s Viewpoint:RAG is as important to large models as weapons are to an army. This article provides a detailed introduction to the cutting-edge knowledge of RAG. It's like going to war with an advanced armory at our disposal, making us invincible in all directions!
Large Language Models (LLMs) outperform any previous models in the field of Natural Language Processing (NLP).The GPT family of models [Brown et al., 2020, OpenAI, 2023], the LLama family of models [Touvron et al., 2023], Gemini [Google, 2023], and other large-scale language models have demonstrated superior language and knowledge acquisition beyond the human level on multiple evaluation benchmarks [Wang et al. 2019, Hendrycks et al. 2020, Srivastava et al. 2022].
However, large-scale language models also show many drawbacks. They often create false facts [Zhang et al., 2023b] and lack knowledge when dealing with domain-specific or highly specialized queries [Kandpal et al., 2023]. For example, LLMs may not be able to provide accurate answers when the required information is beyond the scope of the model's training data or when up-to-date data is required. This limitation poses a challenge in deploying generative AI into real-world production environments, as blindly using black-box LLMs may not be sufficient.
Traditionally, neural networks have adapted to specific domains or proprietary information by fine-tuning models to parameterize knowledge. While this technique has yielded significant results, it is computationally intensive, costly, and requires specialized technical expertise, making it less adaptable. Parametric and non-parametric knowledge play different roles. Parametric knowledge, which is acquired by training LLMs and stored in neural network weights, represents the model's understanding and generalization of the training data and forms the basis for generating responses. On the other hand, non-parametric knowledge exists in external knowledge sources, such as vector databases, and is not directly encoded into the model, but serves as updatable supplementary information. Non-parametric knowledge enables LLMs to access and utilize up-to-date or domain-specific information, increasing the accuracy and relevance of responses.
Purely parameterized language models (LLMs) store world knowledge obtained from a large corpus in the parameters of the model. However, such models have limitations. First, it is difficult to retain all the knowledge in the training corpus, especially the less common and more specific knowledge. Second, parameterized knowledge tends to become outdated over time because the model parameters cannot be dynamically updated. Finally, parameter expansion leads to increased computational overhead for training and inference. In order to address the limitations of purely parametric models, language models can be modeled using a semi-parametric approach, by integrating a non-parametric corpus database with the parametric model. This approach is called Retrieval Augmented Generation (RAG).
The term Retrieval Augmented Generation (RAG) was originally introduced by [Lewis et al., 2020]. It combines pre-trained retrievers and pre-trained seq2seq models (generators) with end-to-end fine-tuning to capture knowledge in a more interpretable and modular way. Prior to the advent of large-scale models, RAG focused primarily on direct optimization of end-to-end models. Common practices in retrieval were dense retrieval, such as using vector-based Dense Passage Retrieval (DPR) [Karpukhin et al., 2020], and training smaller models on the generative side. Due to the small overall parameter size, retrievers and generators are usually trained or fine-tuned end-to-end in a synchronized manner [Izacard et al., 2022].
Since the emergence of LLMs such as ChatGPT, generative language models have become mainstream, demonstrating impressive performance in a variety of linguistic tasks [Bai et al.] However, LLMs still face illusion [Yao et al., 2023, Bang et al., 2023], knowledge updating, and data-related problems. This affects the reliability of LLMs, which perform poorly in certain serious task scenarios, especially knowledge-intensive tasks that require access to large amounts of knowledge, such as open-domain quizzing [Chen and Yih, 2020, Reddy et al. 2019, Kwiatkowski et al. 2019] and commonsense reasoning [Clark et al., 2019, Bisk et al.
This paper systematically reviews and analyzes the current research approaches and future development paths of RAG, categorizing them into three main paradigms: Naive RAG, Advanced RAG, and Modular RAG.Subsequently, the paper provides a comprehensive summary of the three core components: retrieval, enhancement, and generation, highlighting the directions of improvement and current technical features of RAG. In the Enhancement Methods section, the current work is organized into three areas: enhancement phases of RAG, enhancement data sources, and enhancement processes. In addition, the paper summarizes the evaluation system, applicable scenarios, and other elements related to RAG. Through this paper, readers will gain a more comprehensive and systematic understanding of large-scale modeling and retrieval enhancement generation. They will be familiar with the evolution path and key techniques of knowledge retrieval enhancement, be able to identify the advantages and disadvantages of different techniques, recognize the applicable scenarios, and explore typical cases of current practical applications. It is worth noting that in previous work, Feng et al. [2023b] systematically reviewed the methods, applications and future trends of combining large models with knowledge, focusing mainly on knowledge editing and retrieval enhancement methods.Zhu et al. [2023] introduced the recent advances in the enhancement of large-scale language model retrieval systems, with a special focus on retrieval systems. Meanwhile, Asai et al. [2023a] focuses on the "what", "when", and "how" questions, and analyzes and explains the key processes of retrieval-based language modeling. process of retrieval-based language modeling is analyzed and explained. In contrast, this paper aims to provide a systematic overview of the whole process of retrieval-enhanced generation (RAG), with a special focus on the study of enhancing large-scale language modeling through knowledge retrieval.
The development of RAG algorithms and models is illustrated in Figure 1. On the timeline, most of the RAG-related research appeared after 2020, with the release of ChatGPT in December 2022 becoming an important turning point. Since the release of ChatGPT, research in the field of natural language processing has entered the era of big models.Naive RAG techniques have rapidly gained attention, leading to a rapid increase in the number of related studies. In terms of reinforcement strategies, reinforcement research in the pre-training and supervised fine-tuning phases has been conducted since the concept of RAG was introduced. However, most of the reinforcement studies in the inference phase have emerged in the era of LLMs. This is mainly due to the high training cost of high-performance large models. Researchers have attempted to incorporate external knowledge into model generation in a cost-effective manner by including RAG modules in the inference phase. Regarding the use of augmented data, early RAGs focused on the application of unstructured data, especially in the context of open-domain Q&A. Subsequently, the range of knowledge sources for retrieval was extended, and the use of high-quality data as a source of knowledge effectively addressed issues such as internalization of erroneous knowledge and illusions in large models. This includes structured knowledge, with knowledge graphs being a representative example. More recently, self-retrieval has attracted more attention, which involves mining the knowledge of LLMs themselves to enhance their performance.
Conclusion
In conclusion, RAG has emerged as a promising approach to address the limitations of LLMs, particularly in terms of hallucination and slow knowledge updates. By effectively combining the strengths of generative models and the flexibility of retrieval modules, RAG provides a viable solution to the inherent knowledge incompleteness and insufficiency issues of purely parameterized models.
Frequently Asked Questions
- What is Retrieval-Augmented Generation (RAG)?
RAG is a method that involves retrieving relevant information from an external knowledge base before using Large Language Models (LLMs) to answer questions. - What are the main components of RAG?
The main components of RAG are the retriever, generator, and enhancement methods. - What are the paradigms of RAG?
The paradigms of RAG are Naive RAG, Advanced RAG, and Modular RAG. - What are the limitations of LLMs that RAG addresses?
RAG addresses the limitations of LLMs such as hallucination, slow knowledge updates, and lack of transparency in answers. - How does RAG improve the accuracy of LLMs?
RAG improves the accuracy of LLMs by retrieving relevant information from an external knowledge base before using the models to answer questions. This reduces the hallucination phenomenon and promotes knowledge updates.