ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.12253
  4. Cited By
Toward Self-Improvement of LLMs via Imagination, Searching, and
  Criticizing

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

18 April 2024
Ye Tian
Baolin Peng
Linfeng Song
Lifeng Jin
Dian Yu
Haitao Mi
Dong Yu
    LRM
    ReLM
ArXivPDFHTML

Papers citing "Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing"

50 / 51 papers shown
Title
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
C. L. P. Chen
J. Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
53
0
0
30 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
X. Li
Kwan-Yee Kenneth Wong
LLMAG
ReLM
LRM
82
0
0
27 Apr 2025
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data
Wei Zou
Sen Yang
Yu Bao
Shujian Huang
Jiajun Chen
Shanbo Cheng
SyDa
21
0
0
20 Apr 2025
Teaching Large Language Models to Reason through Learning and Forgetting
Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLM
CLL
LRM
37
0
0
15 Apr 2025
Safe Screening Rules for Group OWL Models
Safe Screening Rules for Group OWL Models
Runxue Bao
Quanchao Lu
Yanfu Zhang
29
0
0
04 Apr 2025
A Survey of Large Language Model Agents for Question Answering
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
55
0
0
24 Mar 2025
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
Y. Li
Jiahao Xu
Tian Liang
Xingyu Chen
Zhiwei He
...
Rui Wang
Z. Zhang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
32
1
0
21 Mar 2025
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos
Haolin Yang
Feilong Tang
Ming Hu
Yulong Li
Junjie Guo
Yexin Liu
Zelin Peng
Junjun He
Zongyuan Ge
VGen
DiffM
92
0
0
20 Mar 2025
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal
Vaibhav Aggarwal
Ojasv Kamal
Abhinav Japesh
Zhijing Jin
Bernhard Schölkopf
50
1
0
18 Mar 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Y. Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
57
0
0
17 Mar 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
36
4
0
13 Mar 2025
Towards Widening The Distillation Bottleneck for Reasoning Models
Huifeng Yin
Yu Zhao
M. Wu
Xuanfan Ni
Bo Zeng
...
Liangying Shao
Chenyang Lyu
Longyue Wang
Weihua Luo
Kaifu Zhang
LRM
34
1
0
03 Mar 2025
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Juntai Cao
Xiang Zhang
Raymond Li
Chuyuan Li
Shafiq R. Joty
Giuseppe Carenini
54
1
0
27 Feb 2025
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Yudi Zhang
Lu Wang
Meng Fang
Yali Du
Chenghua Huang
...
Qingwei Lin
Mykola Pechenizkiy
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
ALM
71
0
0
26 Feb 2025
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Guijin Son
Jiwoo Hong
Hyunwoo Ko
James Thorne
LRM
46
5
0
24 Feb 2025
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
S2^22R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Ruotian Ma
Peisong Wang
Cheng Liu
Xingyan Liu
Jiaqi Chen
Bang Zhang
Xin Zhou
Nan Du
Jia Li
LRM
54
2
0
18 Feb 2025
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Yujie Lin
Ante Wang
Moye Chen
Jingyao Liu
Hao Liu
Jinsong Su
Xinyan Xiao
LRM
46
2
0
17 Feb 2025
Bag of Tricks for Inference-time Computation of LLM Reasoning
Bag of Tricks for Inference-time Computation of LLM Reasoning
Fan Liu
Wenshuo Chao
Naiqiang Tan
Hao Liu
OffRL
LRM
69
3
0
11 Feb 2025
Adaptive Self-improvement LLM Agentic System for ML Library Development
Adaptive Self-improvement LLM Agentic System for ML Library Development
Genghan Zhang
Weixin Liang
Olivia Hsu
K. Olukotun
49
0
0
04 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
57
50
0
28 Jan 2025
From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning
From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning
Yafu Li
Zhilin Wang
Tingchen Fu
Ganqu Cui
Sen Yang
Yu Cheng
40
1
0
21 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
76
207
0
03 Jan 2025
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
H. Zhang
Xiaoman Pan
Hongwei Wang
Kaixin Ma
W. Yu
Dong Yu
LLMAG
43
3
0
03 Jan 2025
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen
Jiahao Xu
Tian Liang
Zhiwei He
Jianhui Pang
...
Z. Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
ReLM
48
90
0
30 Dec 2024
Active Inference for Self-Organizing Multi-LLM Systems: A Bayesian Thermodynamic Approach to Adaptation
Active Inference for Self-Organizing Multi-LLM Systems: A Bayesian Thermodynamic Approach to Adaptation
Rithvik Prakki
LLMAG
AI4CE
80
0
0
10 Dec 2024
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language
  Models
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models
Hieu Tran
Zonghai Yao
Junda Wang
Yifan Zhang
Zhichao Yang
Hong-ye Yu
LRM
64
0
0
03 Dec 2024
Scaling LLM Inference with Optimized Sample Compute Allocation
Scaling LLM Inference with Optimized Sample Compute Allocation
Kexun Zhang
Shang Zhou
Danqing Wang
William Yang Wang
Lei Li
35
7
0
29 Oct 2024
From Imitation to Introspection: Probing Self-Consciousness in Language
  Models
From Imitation to Introspection: Probing Self-Consciousness in Language Models
Sirui Chen
Shu Yu
Shengjie Zhao
Chaochao Lu
MILM
LRM
24
1
0
24 Oct 2024
Think Thrice Before You Act: Progressive Thought Refinement in Large
  Language Models
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Chengyu Du
Jinyi Han
Yizhou Ying
Aili Chen
Qianyu He
...
Haoran Guo
Jiaqing Liang
Zulong Chen
Liangyue Li
Yanghua Xiao
KELM
CLL
LRM
22
1
0
17 Oct 2024
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary
  Space with Tree Search
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search
Chenglin Li
Qianglong Chen
Zhi Li
Feng Tao
Yicheng Li
Hao Chen
Fei Yu
Yin Zhang
SyDa
26
0
0
14 Oct 2024
VideoAgent: Self-Improving Video Generation
VideoAgent: Self-Improving Video Generation
Achint Soni
Sreyas Venkataraman
Abhranil Chandra
Sebastian Fischmeister
Percy Liang
Bo Dai
Sherry Yang
LM&Ro
VGen
35
7
0
14 Oct 2024
Self-Boosting Large Language Models with Synthetic Preference Data
Self-Boosting Large Language Models with Synthetic Preference Data
Qingxiu Dong
Li Dong
Xingxing Zhang
Zhifang Sui
Furu Wei
SyDa
31
1
0
09 Oct 2024
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge
  with Curriculum Preference Learning
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Xiyao Wang
Linfeng Song
Ye Tian
Dian Yu
Baolin Peng
Haitao Mi
Furong Huang
Dong Yu
LRM
34
9
0
09 Oct 2024
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and
  Generation
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
ALM
11
9
0
04 Oct 2024
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
  Mathematical Reasoning
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
19
42
0
03 Oct 2024
Interpretable Contrastive Monte Carlo Tree Search Reasoning
Interpretable Contrastive Monte Carlo Tree Search Reasoning
Zitian Gao
Boye Niu
Xuzheng He
Haotian Xu
Hongzhang Liu
Aiwei Liu
Xuming Hu
Lijie Wen
LRM
44
1
0
02 Oct 2024
Self-evolving Agents with reflective and memory-augmented abilities
Self-evolving Agents with reflective and memory-augmented abilities
Xuechen Liang
Yangfan He
Yinghui Xia
Xinyuan Song
Jianhui Wang
...
Keqin Li
Jiaqi Chen
Jinsong Yang
Siyuan Chen
Tianyu Shi
LLMAG
KELM
CLL
27
2
0
01 Sep 2024
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task
  Execution with Strategic Exploration
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration
Yao Zhang
Zijian Ma
Yunpu Ma
Zhen Han
Yu Wu
Volker Tresp
LLMAG
30
22
0
28 Aug 2024
Internal Consistency and Self-Feedback in Large Language Models: A
  Survey
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Xun Liang
Shichao Song
Zifan Zheng
Hanyu Wang
Qingchen Yu
...
Rong-Hua Li
Peng Cheng
Zhonghao Wang
Feiyu Xiong
Zhiyu Li
HILM
LRM
54
23
0
19 Jul 2024
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of
  Multimodal Models
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
Pengxiang Li
Zhi Gao
Bofei Zhang
Tao Yuan
Yuwei Wu
Mehrtash Harandi
Yunde Jia
Song-Chun Zhu
Qing Li
VLM
MLLM
27
2
0
16 Jul 2024
LASSI: An LLM-based Automated Self-Correcting Pipeline for Translating Parallel Scientific Codes
LASSI: An LLM-based Automated Self-Correcting Pipeline for Translating Parallel Scientific Codes
M. Dearing
Yiheng Tao
Xingfu Wu
Z. Lan
V. Taylor
22
3
0
30 Jun 2024
LiteSearch: Efficacious Tree Search for LLM
LiteSearch: Efficacious Tree Search for LLM
Ante Wang
Linfeng Song
Ye Tian
Baolin Peng
Dian Yu
Haitao Mi
Jinsong Su
Dong Yu
33
14
0
29 Jun 2024
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning
  in LLMs
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
Xuan Zhang
Chao Du
Tianyu Pang
Qian Liu
Wei Gao
Min-Bin Lin
LRM
AI4CE
31
34
0
13 Jun 2024
ContraSolver: Self-Alignment of Language Models by Resolving Internal
  Preference Contradictions
ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Xu Zhang
Xunjian Yin
Xiaojun Wan
32
3
0
13 Jun 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse
  Environments
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Zhiheng Xi
Yiwen Ding
Wenxiang Chen
Boyang Hong
Honglin Guo
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yu-Gang Jiang
LLMAG
LM&Ro
25
28
0
06 Jun 2024
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
Jikun Kang
Xin Zhe Li
Xi Chen
Amirreza Kazemi
Qianyi Sun
...
Xu He
Quan He
Feng Wen
Jianye Hao
Jun Yao
LRM
ReLM
18
14
0
25 May 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
215
291
0
18 Jan 2024
Human-Instruction-Free LLM Self-Alignment with Limited Samples
Human-Instruction-Free LLM Self-Alignment with Limited Samples
Hongyi Guo
Yuanshun Yao
Wei Shen
Jiaheng Wei
Xiaoying Zhang
Zhaoran Wang
Yang Liu
90
10
0
06 Jan 2024
Self-Evaluation Guided Beam Search for Reasoning
Self-Evaluation Guided Beam Search for Reasoning
Yuxi Xie
Kenji Kawaguchi
Yiran Zhao
Xu Zhao
MingSung Kan
Junxian He
Qizhe Xie
LRM
156
128
0
01 May 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
12
Next