Light-R1

by 360 Smart Brain

Light-R1 is an open-source AI model by 360 Smart Brain, specializing in long chain reasoning in mathematics, achieving superior performance with low training costs.

What is Light-R1?

Light-R1 is an open-source AI model developed by 360 Smart Brain, focusing on long chain reasoning in mathematics. It is based on Qwen2.5-32B-Instruct and trained with 70,000 mathematical data points using a two-stage curriculum learning approach (SFT+DPO). Light-R1 outperforms DeepSeek-R1-Distill-Qwen-32B, scoring 76.6 in the AIME24 test compared to DeepSeek's 72.6. The model is cost-efficient, requiring only 12 H800 machines running for 6 hours, costing approximately $1000. Light-R1 is fully open-source, including the model, dataset, training framework, and evaluation code, making it a valuable resource for the open-source community and a reference for low-cost training of specialized models.

Main Features of Light-R1

Efficient Mathematical Problem Solving: Quickly and accurately solves complex mathematical problems, including algebra, geometry, and probability.
Enhanced Reasoning Ability: Strong logical reasoning capabilities, supporting long chain problem solving.
Generalization Ability: Demonstrates generalization ability in other fields such as logical reasoning and language understanding.
Low-Cost Training and Deployment: Achieves high performance with extremely low costs, suitable for users or enterprises with limited resources for rapid deployment and application.

Technical Principles of Light-R1

Base Model and Starting Point: The model is developed based on Qwen2.5-32B-Instruct, achieving performance improvement from scratch to surpass DeepSeek-R1-Distill.
Curriculum Learning:
SFT (Supervised Fine-Tuning): Data is filtered by difficulty level and fine-tuned in two stages. The first stage uses 70,000 data points, and the second stage selects the most difficult 3,000 data points for further fine-tuning.
DPO (Direct Preference Optimization): Based on SFT, the model's output quality is optimized through multiple sampling and preference pair construction.
Data Processing and Deduplication: Training data comes from multiple open-source mathematical datasets (e.g., OpenR1-Math-220k, OpenThoughts-114k), with strict deduplication to avoid test data leakage affecting model performance.
Model Fusion: The final Light-R1-32B is obtained by fusing the SFT stage 2, DPO, and another DPO version of the model, further enhancing performance and stability.
Training Framework and Optimization: Using the 360-LLaMA-Factory training framework, supporting sequence parallelism and efficient distributed training. Based on optimized training processes, Light-R1 can complete training in just 6 hours on 12 H800 machines.

Project Address of Light-R1

GitHub Repository: https://github.com/Qihoo360/Light-R1
HuggingFace Model Library: https://huggingface.co/collections/qihoo360/light-r1

Application Scenarios of Light-R1

Education: As a mathematical learning tool, helping students solve complex problems, providing problem-solving steps and ideas, suitable for math competitions and daily learning.
Research and Academia: Assisting in mathematical research and cross-disciplinary problem solving, such as physical modeling and engineering optimization.
Enterprise Applications: Used for data analysis, risk assessment, supply chain optimization, and solving complex problems.
Software Integration: Integrated into intelligent assistants and mathematical software, enhancing reasoning and problem-solving functions.
Open Source and Developers: Supporting developers in customizing and extending the model, promoting the development of the open-source community.

Model Capabilities

Model Type

language

Supported Tasks

Mathematical Problem Solving Logical Reasoning Language Understanding

Usage & Integration

Pricing

free

License

Open Source Apache-2.0

Screenshots & Images

Additional Images

Try Now

Stats

161 Views

0 Favorites

Community & Support

GitHub Repository

Similar Models

Ola by Tsinghua University, Tencent Hunyuan Research Team, NUS S-Lab

453

Zonos by Zyphra

389

Step-Video-T2V by Leapfrogging Star

460

Light-R1

What is Light-R1?

Main Features of Light-R1

Technical Principles of Light-R1

Project Address of Light-R1