Chains of Thought (CoT) reasoning models, such as the Critic-CoT framework, are designed to enhance the reasoning abilities of large language models (LLMs) through self-critique and refinement. The Critic-CoT approach pushes LLMs toward System-2-like critic capabilities, enabling them to engage in slow, analytic self-critique and iterative refinement. This is achieved through a step-wise CoT reasoning paradigm and the automatic construction of distant-supervision data without human annotation.
Experiments on datasets like GSM8K and MATH have demonstrated that Critic-CoT significantly boosts task-solving performance by filtering out invalid solutions and refining reasoning processes. The framework also investigates the intrinsic correlation between critique and task-solving abilities within LLMs, revealing that these abilities can mutually reinforce each other rather than conflict.
For more detailed information, you can refer to the arXiv paper and the OpenReview submission.