START is a tool-enhanced reasoning model that improves the reasoning capabilities of large language models by integrating external tools like Python code executors.
What is START?
START (Self-Taught Reasoner with Tools) is a novel reasoning model developed by Alibaba Group and the University of Science and Technology of China. It enhances the reasoning capabilities of large language models (LLMs) by integrating external tools such as Python code executors. START employs the "Hint-infer" technique to insert prompts during the reasoning process, encouraging the model to use external tools. It also utilizes the "Hint-RFT" framework for self-learning and fine-tuning. START introduces tool invocation on top of long-chain reasoning (Long CoT), significantly improving accuracy and efficiency in complex mathematical problems, scientific questions, and programming challenges. It has outperformed existing models in multiple benchmarks and is the first open-source model to combine long-chain reasoning with tool integration.

Main Features of START
- Complex Calculations and Verification: Calls Python code executors to perform complex mathematical calculations, logical verifications, and simulations.
- Self-Debugging and Optimization: Uses tools to execute code and verify outputs, automatically detecting errors and debugging to improve answer accuracy.
- Multi-Strategy Exploration: Guides the model to try various reasoning paths and methods based on hints, enhancing flexibility and adaptability when facing complex problems.
- Improved Reasoning Efficiency: Reduces hallucinations in complex tasks through tool invocation and self-verification, improving reasoning efficiency and reliability.
Technical Principles of START
- Long-Chain Reasoning: Inherits the advantages of long-chain reasoning, breaking down problems into multiple intermediate reasoning steps to simulate deep human thinking and improve reasoning capabilities in complex tasks.
- Tool Integration: Compensates for the shortcomings of traditional long-chain reasoning by invoking external tools like Python code executors. The model generates code during reasoning and uses tools to verify results.
- Hint-infer: Inserts manually designed hints during the reasoning process to encourage the model to invoke external tools. Guides the model to call tools at specific nodes without additional demonstration data.
- Hint-RFT: Combines Hint-infer with Rejection Sampling Fine-Tuning (RFT) to score, filter, and modify the reasoning trajectories generated by the model, further optimizing the model's tool usage capabilities.
- Self-Learning Framework: Uses active learning to select valuable data from the model's reasoning trajectories for fine-tuning, enabling the model to learn how to use tools more effectively.
- Test-Time Expansion: Inserts hints at the end of reasoning to increase the model's thinking time and tool invocation frequency, improving reasoning accuracy and success rates.
Project Address of START
Application Scenarios of START
- Mathematical Problem Solving: Solves complex mathematical problems, such as math competitions and advanced mathematics, using code verification to improve accuracy.
- Scientific Research Assistance: Helps with complex calculations and scientific problems in physics, chemistry, biology, and other fields.
- Programming and Debugging: Generates code and automatically debugs to solve programming challenges, improving development efficiency.
- Interdisciplinary Problem Solving: Integrates knowledge from multiple disciplines to solve complex tasks in engineering design, data analysis, and more.
- Education and Learning: Serves as an intelligent tutoring tool to assist students in learning mathematics and science, providing detailed problem-solving processes and feedback.