Light-R1 is an open-source AI model developed by 360 Smart Brain, focusing on long chain reasoning in mathematics. It is based on Qwen2.5-32B-Instruct and trained with 70,000 mathematical data points using a two-stage curriculum learning approach (SFT+DPO). Light-R1 outperforms DeepSeek-R1-Distill-Qwen-32B, scoring 76.6 in the AIME24 test compared to DeepSeek's 72.6. The model is cost-efficient, requiring only 12 H800 machines running for 6 hours, costing approximately $1000. Light-R1 is fully open-source, including the model, dataset, training framework, and evaluation code, making it a valuable resource for the open-source community and a reference for low-cost training of specialized models.
What is Light-R1?
Light-R1 is an open-source AI model developed by 360 Smart Brain, focusing on long chain reasoning in mathematics. It is based on Qwen2.5-32B-Instruct and trained with 70,000 mathematical data points using a two-stage curriculum learning approach (SFT+DPO). Light-R1 outperforms DeepSeek-R1-Distill-Qwen-32B, scoring 76.6 in the AIME24 test compared to DeepSeek's 72.6. The model is cost-efficient, requiring only 12 H800 machines running for 6 hours, costing approximately $1000. Light-R1 is fully open-source, including the model, dataset, training framework, and evaluation code, making it a valuable resource for the open-source community and a reference for low-cost training of specialized models.
Main Features of Light-R1
- Efficient Mathematical Problem Solving: Quickly and accurately solves complex mathematical problems, including algebra, geometry, and probability.
- Enhanced Reasoning Ability: Strong logical reasoning capabilities, supporting long chain problem solving.
- Generalization Ability: Demonstrates generalization ability in other fields such as logical reasoning and language understanding.
- Low-Cost Training and Deployment: Achieves high performance with extremely low costs, suitable for users or enterprises with limited resources for rapid deployment and application.
Technical Principles of Light-R1
- Base Model and Starting Point: The model is developed based on Qwen2.5-32B-Instruct, achieving performance improvement from scratch to surpass DeepSeek-R1-Distill.
- Curriculum Learning:
- SFT (Supervised Fine-Tuning): Data is filtered by difficulty level and fine-tuned in two stages. The first stage uses 70,000 data points, and the second stage selects the most difficult 3,000 data points for further fine-tuning.
- DPO (Direct Preference Optimization): Based on SFT, the model's output quality is optimized through multiple sampling and preference pair construction.
- Data Processing and Deduplication: Training data comes from multiple open-source mathematical datasets (e.g., OpenR1-Math-220k, OpenThoughts-114k), with strict deduplication to avoid test data leakage affecting model performance.
- Model Fusion: The final Light-R1-32B is obtained by fusing the SFT stage 2, DPO, and another DPO version of the model, further enhancing performance and stability.
- Training Framework and Optimization: Using the 360-LLaMA-Factory training framework, supporting sequence parallelism and efficient distributed training. Based on optimized training processes, Light-R1 can complete training in just 6 hours on 12 H800 machines.
Project Address of Light-R1
Application Scenarios of Light-R1
- Education: As a mathematical learning tool, helping students solve complex problems, providing problem-solving steps and ideas, suitable for math competitions and daily learning.
- Research and Academia: Assisting in mathematical research and cross-disciplinary problem solving, such as physical modeling and engineering optimization.
- Enterprise Applications: Used for data analysis, risk assessment, supply chain optimization, and solving complex problems.
- Software Integration: Integrated into intelligent assistants and mathematical software, enhancing reasoning and problem-solving functions.
- Open Source and Developers: Supporting developers in customizing and extending the model, promoting the development of the open-source community.