Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.19676
Cited By
v1
v2 (latest)
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
26 May 2025
Lachlan McGinness
Peter Baumgartner
ReLM
LRM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models"
19 / 19 papers shown
Title
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Xumeng Wen
Zihan Liu
Shun Zheng
Zhijian Xu
Shengyu Ye
...
Yang Wang
Junjie Li
Ziming Miao
Jiang Bian
Mao Yang
LRM
43
0
0
17 Jun 2025
Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies
Lachlan McGinness
Peter Baumgartner
LRM
60
1
0
17 Jul 2024
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
104
3
0
02 Jul 2024
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
Jacob Pfau
William Merrill
Samuel R. Bowman
LRM
100
83
0
24 Apr 2024
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DV
RALM
347
1,846
1
18 Dec 2023
Instruction-Following Evaluation for Large Language Models
Jeffrey Zhou
Tianjian Lu
Swaroop Mishra
Siddhartha Brahma
Sujoy Basu
Yi Luan
Denny Zhou
Le Hou
ELM
ALM
LRM
107
299
0
14 Nov 2023
Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review
Banghao Chen
Zhaofeng Zhang
Nicolas Langrené
Shengxin Zhu
LLMAG
122
13
0
23 Oct 2023
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
R. Thomas McCoy
Shunyu Yao
Dan Friedman
Matthew Hardy
Thomas Griffiths
69
160
0
24 Sep 2023
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services
Shengbin Yue
Wei Chen
Siyuan Wang
Bingxuan Li
Chenchen Shen
...
Yuxuan Zhou
Yao Xiao
Song Yun
Xuanjing Huang
Zhongyu Wei
AILaw
ELM
119
99
0
20 Sep 2023
Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks
T. Ullman
LRM
92
241
0
16 Feb 2023
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
301
1,144
0
17 Oct 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
451
4,610
0
27 Oct 2021
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
218
2,024
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
307
5,702
0
07 Jul 2021
Scaling Laws for Transfer
Danny Hernandez
Jared Kaplan
T. Henighan
Sam McCandlish
100
251
0
02 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
679
4,948
0
23 Jan 2020
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
247
2,538
0
19 May 2019
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
Dheeru Dua
Yizhong Wang
Pradeep Dasigi
Gabriel Stanovsky
Sameer Singh
Matt Gardner
AIMat
189
967
0
01 Mar 2019
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
277
2,712
0
25 Sep 2018
1