Light-R1

Light-R1

by 360 Smart Brain
Light-R1 is an open-source AI model developed by 360 Smart Brain, focusing on long chain reasoning in mathematics. It is based on Qwen2.5-32B-Instruct and trained with 70,000 mathematical data points using a two-stage curriculum learning approach (SFT+DPO). Light-R1 outperforms DeepSeek-R1-Distill-Qwen-32B, scoring 76.6 in the AIME24 test compared to DeepSeek's 72.6. The model is cost-efficient, requiring only 12 H800 machines running for 6 hours, costing approximately $1000. Light-R1 is fully open-source, including the model, dataset, training framework, and evaluation code, making it a valuable resource for the open-source community and a reference for low-cost training of specialized models.

What is Light-R1?

Light-R1 is an open-source AI model developed by 360 Smart Brain, focusing on long chain reasoning in mathematics. It is based on Qwen2.5-32B-Instruct and trained with 70,000 mathematical data points using a two-stage curriculum learning approach (SFT+DPO). Light-R1 outperforms DeepSeek-R1-Distill-Qwen-32B, scoring 76.6 in the AIME24 test compared to DeepSeek's 72.6. The model is cost-efficient, requiring only 12 H800 machines running for 6 hours, costing approximately $1000. Light-R1 is fully open-source, including the model, dataset, training framework, and evaluation code, making it a valuable resource for the open-source community and a reference for low-cost training of specialized models.

Main Features of Light-R1

  • Efficient Mathematical Problem Solving: Quickly and accurately solves complex mathematical problems, including algebra, geometry, and probability.
  • Enhanced Reasoning Ability: Strong logical reasoning capabilities, supporting long chain problem solving.
  • Generalization Ability: Demonstrates generalization ability in other fields such as logical reasoning and language understanding.
  • Low-Cost Training and Deployment: Achieves high performance with extremely low costs, suitable for users or enterprises with limited resources for rapid deployment and application.

Technical Principles of Light-R1

  • Base Model and Starting Point: The model is developed based on Qwen2.5-32B-Instruct, achieving performance improvement from scratch to surpass DeepSeek-R1-Distill.
  • Curriculum Learning:
  • SFT (Supervised Fine-Tuning): Data is filtered by difficulty level and fine-tuned in two stages. The first stage uses 70,000 data points, and the second stage selects the most difficult 3,000 data points for further fine-tuning.
  • DPO (Direct Preference Optimization): Based on SFT, the model's output quality is optimized through multiple sampling and preference pair construction.
  • Data Processing and Deduplication: Training data comes from multiple open-source mathematical datasets (e.g., OpenR1-Math-220k, OpenThoughts-114k), with strict deduplication to avoid test data leakage affecting model performance.
  • Model Fusion: The final Light-R1-32B is obtained by fusing the SFT stage 2, DPO, and another DPO version of the model, further enhancing performance and stability.
  • Training Framework and Optimization: Using the 360-LLaMA-Factory training framework, supporting sequence parallelism and efficient distributed training. Based on optimized training processes, Light-R1 can complete training in just 6 hours on 12 H800 machines.

Project Address of Light-R1

Application Scenarios of Light-R1

  • Education: As a mathematical learning tool, helping students solve complex problems, providing problem-solving steps and ideas, suitable for math competitions and daily learning.
  • Research and Academia: Assisting in mathematical research and cross-disciplinary problem solving, such as physical modeling and engineering optimization.
  • Enterprise Applications: Used for data analysis, risk assessment, supply chain optimization, and solving complex problems.
  • Software Integration: Integrated into intelligent assistants and mathematical software, enhancing reasoning and problem-solving functions.
  • Open Source and Developers: Supporting developers in customizing and extending the model, promoting the development of the open-source community.

Model Capabilities

Model Type
language
Supported Tasks
Mathematical Problem Solving Logical Reasoning Language Understanding
Tags
AI Model Mathematics Open Source Long Chain Reasoning Machine Learning Education Research Enterprise Applications Software Integration Low-Cost Training

Usage & Integration

Pricing
free
License
Open Source Apache-2.0

Screenshots & Images

Additional Images

Stats

0 Views
0 Likes
629 GitHub Stars

Community & Support

Similar Models

Ola by Tsinghua University, Tencent Hunyuan Research Team, NUS S-Lab
0
Zonos by Zyphra
0
Step-Video-T2V by Leapfrogging Star
0