Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2503.22732
Cited By
Reasoning Beyond Limits: Advances and Open Problems for LLMs
ICT express (ICT Express), 2025
26 March 2025
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reasoning Beyond Limits: Advances and Open Problems for LLMs"
50 / 100 papers shown
Title
SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning
Ali Asgarov
Umid Suleymanov
Aadyant Khatri
LRM
170
0
0
31 Oct 2025
Out-of-distribution Tests Reveal Compositionality in Chess Transformers
Anna Mészáros
Patrik Reizinger
Ferenc Huszár
CoGe
124
0
0
23 Oct 2025
Interpretable Anomaly-Based DDoS Detection in AI-RAN with XAI and LLMs
Sotiris Chatzimiltis
Mohammad Shojafar
Mahdi Boloursaz Mashhadi
Rahim Tafazolli
119
0
0
27 Jul 2025
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
Xudong Zhu
Jiachen Jiang
Mohammad Mahdi Khalili
Zhihui Zhu
ReLM
LM&Ro
LRM
159
2
0
13 Jun 2025
WebDancer: Towards Autonomous Information Seeking Agency
Jialong Wu
Baixuan Li
Runnan Fang
Wenbiao Yin
Liwen Zhang
...
Yong Jiang
Pengjun Xie
Fei Huang
Jingren Zhou
Jingren Zhou
198
65
0
28 May 2025
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
Song Dai
Yibo Yan
Jiamin Su
Dongfang Zihao
Yubo Gao
...
Jia-Chen Gu
Junyan Zhang
Sicheng Tao
Zhuoran Gao
Xuming Hu
LRM
AI4CE
246
4
0
21 May 2025
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Xiaochen Li
Jiajie Jin
Guanting Dong
Hongjin Qian
Yutao Zhu
Yongkang Wu
Ji-Rong Wen
Zhicheng Dou
LLMAG
LRM
435
143
0
30 Apr 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
433
64
0
20 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
552
933
0
18 Mar 2025
DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module
Krish Sharma
Niyar R. Barman
Nicholas M. Asher
Akshay Chaturvedi
LRM
AIMat
274
39
0
06 Mar 2025
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
655
2,676
0
20 Feb 2025
Process Reinforcement through Implicit Rewards
Ganqu Cui
Lifan Yuan
Liang Luo
Hanbin Wang
Wendi Li
...
Maosong Sun
Zhiyuan Liu
Ning Ding
Bowen Zhou
Ning Ding
OffRL
LRM
430
220
0
03 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
620
377
0
28 Jan 2025
Chain-of-Retrieval Augmented Generation
Liang Wang
Haonan Chen
Nan Yang
Xiaolong Huang
Zhicheng Dou
Furu Wei
RALM
LRM
3DV
ReLM
356
23
0
24 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
1.2K
5,266
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRL
ALM
AI4TS
VLM
LRM
890
665
0
22 Jan 2025
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Yujia Qin
Yining Ye
Junjie Fang
Han Wang
Shihao Liang
...
Haifeng Liu
F. Lin
Tao Peng
Xin Liu
Guang Shi
LLMAG
LM&Ro
231
239
0
21 Jan 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
Guang Dai
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRM
SyDa
ReLM
332
237
0
08 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
880
561
0
03 Jan 2025
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
Junqiao Wang
Zeng Zhang
Yangfan He
Yuyang Song
Lewei He
...
Tang Jingqun
Guangwu Qian
Keqin Li
Qiuwu Chen
Lewei He
485
79
0
29 Dec 2024
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
289
151
0
25 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
397
363
0
18 Dec 2024
Phi-4 Technical Report
Marah Abdin
J. Aneja
Harkirat Singh Behl
Sébastien Bubeck
Ronen Eldan
...
Rachel A. Ward
Yue Wu
Dingli Yu
Cyril Zhang
Yi Zhang
ALM
SyDa
418
402
0
12 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Minlie Huang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
654
47
0
05 Dec 2024
Free Process Rewards without Process Labels
Lifan Yuan
Wendi Li
Huayu Chen
Ganqu Cui
Ning Ding
Kaiyan Zhang
Bowen Zhou
Ziqiang Liu
Yuan Yao
OffRL
275
104
0
02 Dec 2024
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Xingwu Sun
Yanfeng Chen
Yanwen Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoE
ALM
ELM
444
68
0
04 Nov 2024
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yingqian Cui
Pengfei He
Xianfeng Tang
Qi He
Chen Luo
Shucheng Zhou
Yue Xing
LRM
153
12
0
21 Oct 2024
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenyuan Xu
Rujun Han
Zhenting Wang
L. Le
Dhruv Madeka
Lei Li
Wenjie Wang
Rishabh Agarwal
Zifeng Wang
Tomas Pfister
562
25
0
15 Oct 2024
Thinking LLMs: General Instruction Following with Thought Generation
Tianhao Wu
Janice Lan
Weizhe Yuan
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
LRM
243
40
0
14 Oct 2024
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
International Conference on Learning Representations (ICLR), 2024
Amrith Rajagopal Setlur
Chirag Nagpal
Adam Fisch
Xinyang Geng
Jacob Eisenstein
Rishabh Agarwal
Alekh Agarwal
Jonathan Berant
Aviral Kumar
OffRL
LRM
330
154
0
10 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
International Conference on Learning Representations (ICLR), 2024
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
433
190
0
08 Oct 2024
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
179
89
0
03 Oct 2024
Contextual Document Embeddings
International Conference on Learning Representations (ICLR), 2024
John X. Morris
Alexander M. Rush
436
15
0
03 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
International Conference on Learning Representations (ICLR), 2024
Jintao Zhang
Jia Wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
618
80
0
03 Oct 2024
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Jonas Gehring
Kunhao Zheng
Jade Copet
Vegard Mella
Taco Cohen
Gabriel Synnaeve
LLMAG
207
76
0
02 Oct 2024
Not All LLM Reasoners Are Created Equal
Arian Hosseini
Alessandro Sordoni
Daniel Toyama
Rameswar Panda
Rishabh Agarwal
LRM
275
23
0
02 Oct 2024
The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu
Eryk Helenowski
Karthik Abinav Sankararaman
Di Jin
Kaiyan Peng
...
Gabriel Cohen
Yuandong Tian
Hao Ma
Sinong Wang
Han Fang
361
24
0
30 Sep 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2024
Matt Deitke
Christopher Clark
Sangho Lee
Rohun Tripathi
Yue Yang
...
Noah A. Smith
Hannaneh Hajishirzi
Ross Girshick
Ali Farhadi
Aniruddha Kembhavi
OSLM
VLM
418
58
0
25 Sep 2024
Direct Judgement Preference Optimization
Peifeng Wang
Austin Xu
Yilun Zhou
Caiming Xiong
Shafiq Joty
ELM
311
19
0
23 Sep 2024
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez
Laurent Mazaré
Manu Orsini
Amélie Royer
P. Pérez
Edouard Grave
Edouard Grave
Neil Zeghidour
AuLLM
415
336
0
17 Sep 2024
Critique-out-Loud Reward Models
Zachary Ankner
Mansheej Paul
Brandon Cui
Jonathan D. Chang
Prithviraj Ammanabrolu
ALM
LRM
287
67
0
21 Aug 2024
HMoE: Heterogeneous Mixture of Experts for Language Modeling
An Wang
Xingwu Sun
Ruobing Xie
Shuaipeng Li
Jiaqi Zhu
...
J. N. Han
Zhanhui Kang
Di Wang
Naoaki Okazaki
Cheng-zhong Xu
MoE
228
27
0
20 Aug 2024
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Pranav Putta
Edmund Mills
Naman Garg
S. Motwani
Chelsea Finn
Divyansh Garg
Rafael Rafailov
LLMAG
LRM
261
137
0
13 Aug 2024
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Transactions of the Association for Computational Linguistics (TACL), 2024
Karel DÓosterlinck
Winnie Xu
Chris Develder
Thomas Demeester
A. Singh
Christopher Potts
Douwe Kiela
Shikib Mehri
202
31
0
12 Aug 2024
Self-Taught Evaluators
Tianlu Wang
Ilia Kulikov
O. Yu. Golovneva
Ping Yu
Weizhe Yuan
Jane Dwivedi-Yu
Richard Yuanzhe Pang
Maryam Fazel-Zarandi
Jason Weston
Xian Li
ALM
LRM
186
47
0
05 Aug 2024
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Yuan Yao
Tianyu Yu
Ao Zhang
Chongyi Wang
Junbo Cui
...
Xu Han
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
VLM
MLLM
401
851
0
03 Aug 2024
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Tianhao Wu
Weizhe Yuan
O. Yu. Golovneva
Jing Xu
Yuandong Tian
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
ALM
KELM
LRM
310
148
0
28 Jul 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Zhichao Wang
Bin Bi
Shiva K. Pentyala
Kiran Ramnath
Sougata Chaudhuri
...
Z. Zhu
Xiang-Bo Mao
S. Asur
Na
Na Cheng
OffRL
247
111
0
23 Jul 2024
Understanding Reference Policies in Direct Preference Optimization
Yixin Liu
Pengfei Liu
Arman Cohan
220
16
0
18 Jul 2024
Multi-Step Reasoning with Large Language Models, a Survey
ACM Computing Surveys (ACM CSUR), 2024
Aske Plaat
Annie Wong
Suzan Verberne
Joost Broekens
Niki van Stein
Thomas Bäck
OffRL
LRM
AI4CE
ReLM
179
95
0
16 Jul 2024
1
2
Next