ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11628
  4. Cited By
Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
v1v2v3 (latest)

Critique-Guided Distillation for Efficient and Robust Language Model Reasoning

16 May 2025
Berkcan Kapusuzoglu
Supriyo Chakraborty
Chia-Hsuan Lee
Sambit Sahu
ArXiv (abs)PDFHTMLGithub (3755★)

Papers citing "Critique-Guided Distillation for Efficient and Robust Language Model Reasoning"

37 / 37 papers shown
Title
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
Yiyou Sun
Georgia Zhou
Jian Shu
Dexun Li
Nouha Dziri
Dawn Song
ReLMALMELMLRM
258
17
1
16 Apr 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLMLRM
442
266
0
03 Mar 2025
LIMO: Less is More for Reasoning
LIMO: Less is More for Reasoning
Yixin Ye
Zhen Huang
Yang Xiao
Ethan Chern
Shijie Xia
Pengfei Liu
AIMatReLMLRM
806
170
0
05 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
620
376
0
28 Jan 2025
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for
  Enhanced Following of Instructions with Multiple Constraints
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple ConstraintsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Thomas Palmeira Ferraz
Kartik Mehta
Yu-Hsiang Lin
Haw-Shiuan Chang
Shereen Oraby
Sijia Liu
Vivek Subramanian
Tagyoung Chung
Mohit Bansal
Nanyun Peng
227
25
0
09 Oct 2024
Learning to Refine with Fine-Grained Natural Language Feedback
Learning to Refine with Fine-Grained Natural Language Feedback
Manya Wadhwa
Xinyu Zhao
Junyi Jessy Li
Greg Durrett
488
25
0
02 Jul 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zorik Gekhman
G. Yona
Roee Aharoni
Matan Eyal
Amir Feder
Roi Reichart
Jonathan Herzig
358
217
0
09 May 2024
MAmmoTH2: Scaling Instructions from the Web
MAmmoTH2: Scaling Instructions from the WebNeural Information Processing Systems (NeurIPS), 2024
Xiang Yue
Tuney Zheng
Ge Zhang
Lei Ma
ALMLRM
302
146
0
06 May 2024
How Can I Improve? Using GPT to Highlight the Desired and Undesired
  Parts of Open-ended Responses
How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses
Jionghao Lin
Eason Chen
Zeifei Han
Ashish Gurung
Danielle R. Thomas
Wei Tan
Ngoc Dang Nguyen
Kenneth R. Koedinger
195
16
0
01 May 2024
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Easy-to-Hard Generalization: Scalable Alignment Beyond Human SupervisionNeural Information Processing Systems (NeurIPS), 2024
Zhiqing Sun
Longhui Yu
Yikang Shen
Weiyang Liu
Yiming Yang
Sean Welleck
Chuang Gan
193
91
0
14 Mar 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELMAIMat
363
635
0
21 Feb 2024
ELAD: Explanation-Guided Large Language Models Active Distillation
ELAD: Explanation-Guided Large Language Models Active Distillation
Yifei Zhang
Bo Pan
Chen Ling
Yuntong Hu
Bo Pan
189
10
0
20 Feb 2024
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MHELM
421
1,564
0
20 Nov 2023
Instruction-Following Evaluation for Large Language Models
Instruction-Following Evaluation for Large Language Models
Jeffrey Zhou
Tianjian Lu
Swaroop Mishra
Siddhartha Brahma
Sujoy Basu
Yi Luan
Denny Zhou
Le Hou
ELMALMLRM
312
532
0
14 Nov 2023
Teaching Language Models to Self-Improve through Interactive
  Demonstrations
Teaching Language Models to Self-Improve through Interactive Demonstrations
Xiao Yu
Baolin Peng
Michel Galley
Jianfeng Gao
Zhou Yu
LRMReLM
236
27
0
20 Oct 2023
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
  Models
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
L. Yu
Weisen Jiang
Han Shi
Jincheng Yu
Zhengying Liu
Yu Zhang
James T. Kwok
Zheng Li
Adrian Weller
Weiyang Liu
OSLMLRM
505
541
0
21 Sep 2023
Scaling Relationship on Learning Mathematical Reasoning with Large
  Language Models
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Zheng Yuan
Hongyi Yuan
Cheng Li
Guanting Dong
Keming Lu
Chuanqi Tan
Chang Zhou
Jingren Zhou
LRMALM
243
276
0
03 Aug 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language
  Models
Let Me Teach You: Pedagogical Foundations of Feedback for Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Beatriz Borges
Niket Tandon
Tanja Käser
Antoine Bosselut
404
8
0
01 Jul 2023
Using Large Language Models to Provide Explanatory Feedback to Human
  Tutors
Using Large Language Models to Provide Explanatory Feedback to Human Tutors
Jionghao Lin
Danielle R. Thomas
Feifei Han
Shivang Gupta
Wei Tan
Ngoc Dang Nguyen
Kenneth R. Koedinger
AI4EdLRM
107
19
0
27 Jun 2023
TheoremQA: A Theorem-driven Question Answering dataset
TheoremQA: A Theorem-driven Question Answering datasetConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Wenhu Chen
Ming Yin
Max Ku
Pan Lu
Yixin Wan
Xueguang Ma
Jianyu Xu
Xinyi Wang
Tony Xia
AIMat
280
183
0
21 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive
  Critiquing
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive CritiquingInternational Conference on Learning Representations (ICLR), 2023
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELMLRM
369
559
0
19 May 2023
Teaching Large Language Models to Self-Debug
Teaching Large Language Models to Self-DebugInternational Conference on Learning Representations (ICLR), 2023
Xinyun Chen
Maxwell Lin
Nathanael Scharli
Denny Zhou
LRM
481
897
0
11 Apr 2023
REFINER: Reasoning Feedback on Intermediate Representations
REFINER: Reasoning Feedback on Intermediate RepresentationsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Debjit Paul
Mete Ismayilzada
Maxime Peyrard
Beatriz Borges
Antoine Bosselut
Robert West
Boi Faltings
ReLMLRM
298
218
0
04 Apr 2023
Self-Refine: Iterative Refinement with Self-Feedback
Self-Refine: Iterative Refinement with Self-FeedbackNeural Information Processing Systems (NeurIPS), 2023
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLMLRMDiffM
724
2,496
0
30 Mar 2023
Language Models can Solve Computer Tasks
Language Models can Solve Computer TasksNeural Information Processing Systems (NeurIPS), 2023
Geunwoo Kim
Pierre Baldi
Alexander Shmakov
LLMAGLM&Ro
466
453
0
30 Mar 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Reflexion: Language Agents with Verbal Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Noah Shinn
Federico Cassano
Beck Labash
A. Gopinath
Karthik Narasimhan
Shunyu Yao
LLMAGKELM
577
2,148
0
20 Mar 2023
Baldur: Whole-Proof Generation and Repair with Large Language Models
Baldur: Whole-Proof Generation and Repair with Large Language Models
E. First
M. Rabe
Talia Ringer
Yuriy Brun
276
135
0
08 Mar 2023
Distilling Reasoning Capabilities into Smaller Language Models
Distilling Reasoning Capabilities into Smaller Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Kumar Shridhar
Alessandro Stolfo
Mrinmaya Sachan
LRMReLM
310
216
0
01 Dec 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve ThemAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALMELMLRMReLM
510
1,504
0
17 Oct 2022
RARR: Researching and Revising What Language Models Say, Using Language
  Models
RARR: Researching and Revising What Language Models Say, Using Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Luyu Gao
Zhuyun Dai
Panupong Pasupat
Anthony Chen
Arun Tejasvi Chaganty
...
Vincent Zhao
Ni Lao
Hongrae Lee
Da-Cheng Juan
Kelvin Guu
HILMKELM
599
278
0
17 Oct 2022
CodeT: Code Generation with Generated Tests
CodeT: Code Generation with Generated TestsInternational Conference on Learning Representations (ICLR), 2022
Bei Chen
Fengji Zhang
A. Nguyen
Daoguang Zan
Zeqi Lin
Jian-Guang Lou
Weizhu Chen
279
429
0
21 Jul 2022
Self-critiquing models for assisting human evaluators
Self-critiquing models for assisting human evaluators
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
ALMELM
342
354
0
12 Jun 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
2.0K
17,029
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.2K
14,140
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
1.0K
6,610
0
27 Oct 2021
Finetuned Language Models Are Zero-Shot Learners
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALMUQCV
1.1K
4,533
0
03 Sep 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
807
3,767
0
05 Mar 2021
1