LongRAG is a dual-perspective robust retrieval-augmented generation (RAG) framework designed for long-context question answering (LCQA).
What is LongRAG?
LongRAG is a dual-perspective robust retrieval-augmented generation (RAG) framework developed by Tsinghua University, the Chinese Academy of Sciences (CAS), and ZhiPu. It is designed to address long-context question answering (LCQA) by integrating global context understanding and factual detail recognition.
Main Features of LongRAG
- Dual-Perspective Information Processing: Combines global information and factual details for comprehensive answers.
- Hybrid Retriever: Efficiently retrieves relevant information from large datasets.
- LLM-Enhanced Information Extractor: Maps retrieved snippets back to the original text for context restoration.
- CoT-Guided Filter: Uses Chain of Thought (CoT) to focus on relevant information.
- LLM-Enhanced Generator: Generates final answers by combining global and detailed information.
- Automated Fine-Tuning Data Construction: Enhances model performance through automated dataset creation.
Technical Principles
- Retrieval-Augmented Generation (RAG): Retrieves external knowledge to assist in answer generation.
- Global and Detailed Information Integration: Balances local factual details with global context.
- Mapping Strategy: Restores contextual information by mapping snippets to the original text.
- Chain of Thought (CoT): Guides the model to focus on relevant knowledge.
- Filtering Strategy: Retains key factual details while filtering out irrelevant information.
Application Scenarios
- Customer Service and Support: Answers lengthy customer queries accurately.
- Medical Consultation: Processes patient records and medical literature for complex questions.
- Legal Consultation: Analyzes legal documents and cases for in-depth advice.
- Education and Research: Assists in understanding academic articles and research reports.
- Corporate Decision Support: Provides insights from market research and corporate reports.
Getting Started