Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2503.22732
Cited By
Reasoning Beyond Limits: Advances and Open Problems for LLMs
ICT express (ICT Express), 2025
26 March 2025
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reasoning Beyond Limits: Advances and Open Problems for LLMs"
50 / 100 papers shown
Title
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang
Arash Ahmadian
Kelly Marchisio
Julia Kreutzer
Ahmet Üstün
Sara Hooker
226
40
0
02 Jul 2024
Agentless: Demystifying LLM-based Software Engineering Agents
Chunqiu Steven Xia
Yinlin Deng
Soren Dunn
Lingming Zhang
LLMAG
196
219
0
01 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
RALM
291
82
0
01 Jul 2024
Searching for Best Practices in Retrieval-Augmented Generation
Xiaohua Wang
Zhenghua Wang
Xuan Gao
Feiran Zhang
Yixin Wu
...
Qi Qian
Ruicheng Yin
Changze Lv
Xiaoqing Zheng
Xuanjing Huang
267
100
0
01 Jul 2024
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
Chris Cummins
Volker Seeker
Dejan Grubisic
Baptiste Roziere
Jonas Gehring
Gabriel Synnaeve
Hugh Leather
200
52
0
27 Jun 2024
Following Length Constraints in Instructions
Weizhe Yuan
Ilia Kulikov
Ping Yu
Kyunghyun Cho
Sainbayar Sukhbaatar
Jason Weston
Jing Xu
FaML
ALM
159
31
0
25 Jun 2024
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Zhihan Zhang
Zhenwen Liang
Wenhao Yu
Dian Yu
Mengzhao Jia
Dong Yu
Meng Jiang
AIMat
RALM
LRM
ReLM
160
20
0
17 Jun 2024
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level
Jie Liu
Zhanhui Zhou
Jiaheng Liu
Xingyuan Bu
Chao Yang
Han-Sen Zhong
Wanli Ouyang
107
25
0
17 Jun 2024
Mixture-of-Agents Enhances Large Language Model Capabilities
International Conference on Learning Representations (ICLR), 2024
Junlin Wang
Jue Wang
Ben Athiwaratkun
Ce Zhang
James Zou
LLMAG
AIFin
210
258
0
07 Jun 2024
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Liangchen Luo
Yinxiao Liu
Rosanne Liu
Samrat Phatale
Harsh Lara
...
Lei Shu
Yun Zhu
Lei Meng
Jiao Sun
Abhinav Rastogi
LRM
244
304
0
05 Jun 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Neural Information Processing Systems (NeurIPS), 2024
Yu Meng
Mengzhou Xia
Danqi Chen
397
739
0
23 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Tong Lu
Lin Ma
Min Zhang
MoE
207
89
0
18 May 2024
LoRA Learns Less and Forgets Less
D. Biderman
Jose Javier Gonzalez Ortiz
Jacob P. Portes
Mansheej Paul
Philip Greengard
...
Sam Havens
Vitaliy Chiley
Jonathan Frankle
Cody Blakeney
John P. Cunningham
CLL
281
224
0
15 May 2024
Understanding the performance gap between online and offline alignment algorithms
Yunhao Tang
Daniel Guo
Zeyu Zheng
Daniele Calandriello
Yuan Cao
...
Rémi Munos
Bernardo Avila-Pires
Michal Valko
Yong Cheng
Will Dabney
OffRL
OnRL
244
93
0
14 May 2024
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning
Shuo Yin
Weihao You
Zhilong Ji
Guoqiang Zhong
Jinfeng Bai
LRM
SyDa
194
19
0
13 May 2024
Stream of Search (SoS): Learning to Search in Language
Kanishk Gandhi
Denise Lee
Gabriel Grand
Muxin Liu
Winson Cheng
Archit Sharma
Noah D. Goodman
RALM
AIFin
LRM
208
112
0
01 Apr 2024
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
Yuji Cao
Huan Zhao
Yuheng Cheng
Ting Shu
Guolong Liu
Gaoqi Liang
Junhua Zhao
Yun Li
LLMAG
KELM
OffRL
LM&Ro
332
144
0
30 Mar 2024
InternLM2 Technical Report
Zheng Cai
Maosong Cao
Haojiong Chen
Kai-xiang Chen
Keyu Chen
...
Jingming Zhuo
Yi-Ling Zou
Xipeng Qiu
Yu Qiao
Dahua Lin
ALM
248
299
0
26 Mar 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
E. Zelikman
Georges Harik
Yijia Shao
Varuna Jayasiri
Nick Haber
Noah D. Goodman
LLMAG
ReLM
LRM
550
198
0
14 Mar 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELM
AIMat
347
624
0
21 Feb 2024
Chain-of-Thought Reasoning Without Prompting
Xuezhi Wang
Denny Zhou
ReLM
LRM
517
193
0
15 Feb 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLM
LRM
249
186
0
09 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
1.1K
3,497
0
05 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
694
793
0
02 Feb 2024
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LLMAG
451
1,506
0
08 Jan 2024
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Peiyi Wang
Lei Li
Zhihong Shao
R. X. Xu
Damai Dai
Yifei Li
Deli Chen
Y.Wu
Zhifang Sui
AIMat
LRM
ALM
350
633
0
14 Dec 2023
A General Theoretical Paradigm to Understand Learning from Human Preferences
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
493
815
0
18 Oct 2023
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
International Conference on Learning Representations (ICLR), 2023
Zhibin Gou
Zhihong Shao
Yeyun Gong
Haoran Pan
Yujiu Yang
Shiyu Huang
Nan Duan
Weizhu Chen
LRM
AI4CE
LLMAG
306
247
0
29 Sep 2023
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas Griffiths
LLMAG
LM&Ro
519
267
0
05 Sep 2023
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
Zhiwei Liu
Weiran Yao
Jianguo Zhang
Le Xue
Shelby Heinecke
...
Ran Xu
P. Mùi
Haiquan Wang
Caiming Xiong
Silvio Savarese
LLMAG
188
99
0
11 Aug 2023
From Sparse to Soft Mixtures of Experts
International Conference on Learning Representations (ICLR), 2023
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
375
185
0
02 Aug 2023
A Survey on Multimodal Large Language Models
National Science Review (NSR), 2023
Xinglong Mao
Chaoyou Fu
Zhengye Zhang
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
381
934
0
23 Jun 2023
Let's Verify Step by Step
International Conference on Learning Representations (ICLR), 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
766
2,080
0
31 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
739
6,358
0
29 May 2023
Self-Refine: Iterative Refinement with Self-Feedback
Neural Information Processing Systems (NeurIPS), 2023
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLM
LRM
DiffM
676
2,470
0
30 Mar 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2023
Noah Shinn
Federico Cassano
Beck Labash
A. Gopinath
Karthik Narasimhan
Shunyu Yao
LLMAG
KELM
513
2,116
0
20 Mar 2023
Solving math word problems with process- and outcome-based feedback
J. Uesato
Nate Kushman
Ramana Kumar
Francis Song
Noah Y. Siegel
L. Wang
Antonia Creswell
G. Irving
I. Higgins
FaML
ReLM
AIMat
LRM
268
522
0
25 Nov 2022
Scaling Instruction-Finetuned Language Models
Journal of machine learning research (JMLR), 2022
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
888
3,742
0
20 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
International Conference on Learning Representations (ICLR), 2022
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
1.5K
4,854
0
06 Oct 2022
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLM
LRM
459
677
0
28 Mar 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
1.9K
16,867
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Neural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
2.1K
13,969
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
980
6,535
0
27 Oct 2021
Scaling Vision with Sparse Mixture of Experts
Neural Information Processing Systems (NeurIPS), 2021
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
271
798
0
10 Jun 2021
Learning to summarize from human feedback
Neural Information Processing Systems (NeurIPS), 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
709
2,666
0
02 Sep 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
356
1,548
0
30 Jun 2020
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
1.1K
23,432
0
20 Jul 2017
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
2.4K
157,616
0
12 Jun 2017
Deep reinforcement learning from human preferences
Neural Information Processing Systems (NeurIPS), 2017
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
1.2K
4,250
0
12 Jun 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
International Conference on Learning Representations (ICLR), 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
520
3,534
0
23 Jan 2017
Previous
1
2