v1v2v3 (latest)

Critique-Guided Distillation for Efficient and Robust Language Model Reasoning

16 May 2025

ArXiv (abs)PDF HTML Github (3755★)

Papers citing "Critique-Guided Distillation for Efficient and Robust Language Model Reasoning"

37 / 37 papers shown

Title
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? Yiyou Sun Georgia Zhou Jian Shu Dexun Li Nouha Dziri Dawn Song ReLM ALM ELM LRM 258 17 1 16 Apr 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Kanishk Gandhi Ayush Chakravarthy Anikait Singh Nathan Lile Noah D. Goodman ReLM LRM 442 266 0 03 Mar 2025
LIMO: Less is More for Reasoning Yixin Ye Zhen Huang Yang Xiao Ethan Chern Shijie Xia Pengfei Liu AIMat ReLM LRM 806 170 0 05 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Tianzhe Chu Yuexiang Zhai Jihan Yang Shengbang Tong Saining Xie Dale Schuurmans Quoc V. Le Sergey Levine Yi-An Ma OffRL 620 376 0 28 Jan 2025
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple ConstraintsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Thomas Palmeira Ferraz Kartik Mehta Yu-Hsiang Lin Haw-Shiuan Chang Shereen Oraby Sijia Liu Vivek Subramanian Tagyoung Chung Mohit Bansal Nanyun Peng 227 25 0 09 Oct 2024
Learning to Refine with Fine-Grained Natural Language Feedback Manya Wadhwa Xinyu Zhao Junyi Jessy Li Greg Durrett 488 25 0 02 Jul 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Zorik Gekhman G. Yona Roee Aharoni Matan Eyal Amir Feder Roi Reichart Jonathan Herzig 358 217 0 09 May 2024
MAmmoTH2: Scaling Instructions from the WebNeural Information Processing Systems (NeurIPS), 2024 Xiang Yue Tuney Zheng Ge Zhang Lei Ma ALM LRM 302 146 0 06 May 2024
How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses Jionghao Lin Eason Chen Zeifei Han Ashish Gurung Danielle R. Thomas Wei Tan Ngoc Dang Nguyen Kenneth R. Koedinger 195 16 0 01 May 2024
Easy-to-Hard Generalization: Scalable Alignment Beyond Human SupervisionNeural Information Processing Systems (NeurIPS), 2024 Zhiqing Sun Longhui Yu Yikang Shen Weiyang Liu Yiming Yang Sean Welleck Chuang Gan 193 91 0 14 Mar 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems Chaoqun He Renjie Luo Yuzhuo Bai Shengding Hu Zhen Leng Thai ... Yuxiang Zhang Jie Liu Lei Qi Zhiyuan Liu Maosong Sun ELM AIMat 363 635 0 21 Feb 2024
ELAD: Explanation-Guided Large Language Models Active Distillation Yifei Zhang Bo Pan Chen Ling Yuntong Hu Bo Pan 189 10 0 20 Feb 2024
GPQA: A Graduate-Level Google-Proof Q&A Benchmark David Rein Betty Li Hou Asa Cooper Stickland Jackson Petty Richard Yuanzhe Pang Julien Dirani Julian Michael Samuel R. Bowman AI4MH ELM 421 1,564 0 20 Nov 2023
Instruction-Following Evaluation for Large Language Models Jeffrey Zhou Tianjian Lu Swaroop Mishra Siddhartha Brahma Sujoy Basu Yi Luan Denny Zhou Le Hou ELM ALM LRM 312 532 0 14 Nov 2023
Teaching Language Models to Self-Improve through Interactive Demonstrations Xiao Yu Baolin Peng Michel Galley Jianfeng Gao Zhou Yu LRM ReLM 236 27 0 20 Oct 2023
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023 L. Yu Weisen Jiang Han Shi Jincheng Yu Zhengying Liu Yu Zhang James T. Kwok Zheng Li Adrian Weller Weiyang Liu OSLM LRM 505 541 0 21 Sep 2023
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models Zheng Yuan Hongyi Yuan Cheng Li Guanting Dong Keming Lu Chuanqi Tan Chang Zhou Jingren Zhou LRM ALM 243 276 0 03 Aug 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Beatriz Borges Niket Tandon Tanja Käser Antoine Bosselut 404 8 0 01 Jul 2023
Using Large Language Models to Provide Explanatory Feedback to Human Tutors Jionghao Lin Danielle R. Thomas Feifei Han Shivang Gupta Wei Tan Ngoc Dang Nguyen Kenneth R. Koedinger AI4Ed LRM 107 19 0 27 Jun 2023
TheoremQA: A Theorem-driven Question Answering datasetConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Wenhu Chen Ming Yin Max Ku Pan Lu Yixin Wan Xueguang Ma Jianyu Xu Xinyi Wang Tony Xia AIMat 280 183 0 21 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive CritiquingInternational Conference on Learning Representations (ICLR), 2023 Zhibin Gou Zhihong Shao Yeyun Gong Yelong Shen Yujiu Yang Nan Duan Weizhu Chen KELM LRM 369 559 0 19 May 2023
Teaching Large Language Models to Self-DebugInternational Conference on Learning Representations (ICLR), 2023 Xinyun Chen Maxwell Lin Nathanael Scharli Denny Zhou LRM 481 897 0 11 Apr 2023
REFINER: Reasoning Feedback on Intermediate RepresentationsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023 Debjit Paul Mete Ismayilzada Maxime Peyrard Beatriz Borges Antoine Bosselut Robert West Boi Faltings ReLM LRM 298 218 0 04 Apr 2023
Self-Refine: Iterative Refinement with Self-FeedbackNeural Information Processing Systems (NeurIPS), 2023 Aman Madaan Niket Tandon Prakhar Gupta Skyler Hallinan Luyu Gao ... Bodhisattwa Prasad Majumder Katherine Hermann Sean Welleck Amir Yazdanbakhsh Peter Clark ReLM LRM DiffM 724 2,496 0 30 Mar 2023
Language Models can Solve Computer TasksNeural Information Processing Systems (NeurIPS), 2023 Geunwoo Kim Pierre Baldi Alexander Shmakov LLMAG LM&Ro 466 453 0 30 Mar 2023
Reflexion: Language Agents with Verbal Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023 Noah Shinn Federico Cassano Beck Labash A. Gopinath Karthik Narasimhan Shunyu Yao LLMAG KELM 577 2,148 0 20 Mar 2023
Baldur: Whole-Proof Generation and Repair with Large Language Models E. First M. Rabe Talia Ringer Yuriy Brun 276 135 0 08 Mar 2023
Distilling Reasoning Capabilities into Smaller Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Kumar Shridhar Alessandro Stolfo Mrinmaya Sachan LRM ReLM 310 216 0 01 Dec 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve ThemAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Mirac Suzgun Nathan Scales Nathanael Scharli Sebastian Gehrmann Yi Tay ... Aakanksha Chowdhery Quoc V. Le Ed H. Chi Denny Zhou Jason W. Wei ALM ELM LRM ReLM 510 1,504 0 17 Oct 2022
RARR: Researching and Revising What Language Models Say, Using Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Luyu Gao Zhuyun Dai Panupong Pasupat Anthony Chen Arun Tejasvi Chaganty ... Vincent Zhao Ni Lao Hongrae Lee Da-Cheng Juan Kelvin Guu HILM KELM 599 278 0 17 Oct 2022
CodeT: Code Generation with Generated TestsInternational Conference on Learning Representations (ICLR), 2022 Bei Chen Fengji Zhang A. Nguyen Daoguang Zan Zeqi Lin Jian-Guang Lou Weizhu Chen 279 429 0 21 Jul 2022
Self-critiquing models for assisting human evaluators William Saunders Catherine Yeh Jeff Wu Steven Bills Ouyang Long Jonathan Ward Jan Leike ALM ELM 342 354 0 12 Jun 2022
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022 Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 2.0K 17,029 0 04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022 Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 2.2K 14,140 0 28 Jan 2022
Training Verifiers to Solve Math Word Problems K. Cobbe V. Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun ... Jerry Tworek Jacob Hilton Reiichiro Nakano Christopher Hesse John Schulman ReLM OffRL LRM 1.0K 6,610 0 27 Oct 2021
Finetuned Language Models Are Zero-Shot Learners Jason W. Wei Maarten Bosma Vincent Zhao Kelvin Guu Adams Wei Yu Brian Lester Nan Du Andrew M. Dai Quoc V. Le ALM UQCV 1.1K 4,533 0 03 Sep 2021
Measuring Mathematical Problem Solving With the MATH Dataset Dan Hendrycks Collin Burns Saurav Kadavath Akul Arora Steven Basart Eric Tang Basel Alomair Jacob Steinhardt ReLM FaML 807 3,767 0 05 Mar 2021