Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2503.22732
Cited By
Reasoning Beyond Limits: Advances and Open Problems for LLMs
ICT express (ICT Express), 2025
26 March 2025
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reasoning Beyond Limits: Advances and Open Problems for LLMs"
50 / 100 papers shown
Title
SIGMA: Search-Augmented On-Demand Knowledge Integration for Agentic Mathematical Reasoning
Ali Asgarov
Umid Suleymanov
Aadyant Khatri
LRM
162
0
0
31 Oct 2025
Out-of-distribution Tests Reveal Compositionality in Chess Transformers
Anna Mészáros
Patrik Reizinger
Ferenc Huszár
CoGe
92
0
0
23 Oct 2025
Interpretable Anomaly-Based DDoS Detection in AI-RAN with XAI and LLMs
Sotiris Chatzimiltis
Mohammad Shojafar
Mahdi Boloursaz Mashhadi
Rahim Tafazolli
103
0
0
27 Jul 2025
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
Xudong Zhu
Jiachen Jiang
Mohammad Mahdi Khalili
Zhihui Zhu
ReLM
LM&Ro
LRM
155
2
0
13 Jun 2025
WebDancer: Towards Autonomous Information Seeking Agency
Jialong Wu
Baixuan Li
Runnan Fang
Wenbiao Yin
Liwen Zhang
...
Yong Jiang
Pengjun Xie
Fei Huang
Jingren Zhou
Jingren Zhou
194
64
0
28 May 2025
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
Song Dai
Yibo Yan
Jiamin Su
Dongfang Zihao
Yubo Gao
...
Jia-Chen Gu
Junyan Zhang
Sicheng Tao
Zhuoran Gao
Xuming Hu
LRM
AI4CE
246
4
0
21 May 2025
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Xiaochen Li
Jiajie Jin
Guanting Dong
Hongjin Qian
Yutao Zhu
Yongkang Wu
Ji-Rong Wen
Zhicheng Dou
LLMAG
LRM
391
141
0
30 Apr 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
417
62
0
20 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
516
897
0
18 Mar 2025
DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module
Krish Sharma
Niyar R. Barman
Nicholas M. Asher
Akshay Chaturvedi
LRM
AIMat
262
38
0
06 Mar 2025
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
631
2,497
0
20 Feb 2025
Process Reinforcement through Implicit Rewards
Ganqu Cui
Lifan Yuan
Liang Luo
Hanbin Wang
Wendi Li
...
Maosong Sun
Zhiyuan Liu
Ning Ding
Bowen Zhou
Ning Ding
OffRL
LRM
394
213
0
03 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
584
363
0
28 Jan 2025
Chain-of-Retrieval Augmented Generation
Liang Wang
Haonan Chen
Nan Yang
Xiaolong Huang
Zhicheng Dou
Furu Wei
RALM
LRM
3DV
ReLM
324
22
0
24 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
1.0K
5,096
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRL
ALM
AI4TS
VLM
LRM
838
655
0
22 Jan 2025
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Yujia Qin
Yining Ye
Junjie Fang
Han Wang
Shihao Liang
...
Haifeng Liu
F. Lin
Tao Peng
Xin Liu
Guang Shi
LLMAG
LM&Ro
223
229
0
21 Jan 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
Guang Dai
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRM
SyDa
ReLM
300
234
0
08 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
720
553
0
03 Jan 2025
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
Junqiao Wang
Zeng Zhang
Yangfan He
Yuyang Song
Lewei He
...
Tang Jingqun
Guangwu Qian
Keqin Li
Qiuwu Chen
Lewei He
477
77
0
29 Dec 2024
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
277
148
0
25 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
369
350
0
18 Dec 2024
Phi-4 Technical Report
Marah Abdin
J. Aneja
Harkirat Singh Behl
Sébastien Bubeck
Ronen Eldan
...
Rachel A. Ward
Yue Wu
Dingli Yu
Cyril Zhang
Yi Zhang
ALM
SyDa
394
387
0
12 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Minlie Huang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
586
46
0
05 Dec 2024
Free Process Rewards without Process Labels
Lifan Yuan
Wendi Li
Huayu Chen
Ganqu Cui
Ning Ding
Kaiyan Zhang
Bowen Zhou
Ziqiang Liu
Yuan Yao
OffRL
247
101
0
02 Dec 2024
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Xingwu Sun
Yanfeng Chen
Yanwen Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoE
ALM
ELM
392
68
0
04 Nov 2024
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yingqian Cui
Pengfei He
Xianfeng Tang
Qi He
Chen Luo
Shucheng Zhou
Yue Xing
LRM
153
12
0
21 Oct 2024
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenyuan Xu
Rujun Han
Zhenting Wang
L. Le
Dhruv Madeka
Lei Li
Wenjie Wang
Rishabh Agarwal
Zifeng Wang
Tomas Pfister
554
24
0
15 Oct 2024
Thinking LLMs: General Instruction Following with Thought Generation
Tianhao Wu
Janice Lan
Weizhe Yuan
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
LRM
223
39
0
14 Oct 2024
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
International Conference on Learning Representations (ICLR), 2024
Amrith Rajagopal Setlur
Chirag Nagpal
Adam Fisch
Xinyang Geng
Jacob Eisenstein
Rishabh Agarwal
Alekh Agarwal
Jonathan Berant
Aviral Kumar
OffRL
LRM
298
151
0
10 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
International Conference on Learning Representations (ICLR), 2024
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
381
179
0
08 Oct 2024
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
163
88
0
03 Oct 2024
Contextual Document Embeddings
International Conference on Learning Representations (ICLR), 2024
John X. Morris
Alexander M. Rush
396
15
0
03 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
International Conference on Learning Representations (ICLR), 2024
Jintao Zhang
Jia Wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
586
75
0
03 Oct 2024
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Jonas Gehring
Kunhao Zheng
Jade Copet
Vegard Mella
Taco Cohen
Gabriel Synnaeve
LLMAG
199
73
0
02 Oct 2024
Not All LLM Reasoners Are Created Equal
Arian Hosseini
Alessandro Sordoni
Daniel Toyama
Rameswar Panda
Rishabh Agarwal
LRM
247
23
0
02 Oct 2024
The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu
Eryk Helenowski
Karthik Abinav Sankararaman
Di Jin
Kaiyan Peng
...
Gabriel Cohen
Yuandong Tian
Hao Ma
Sinong Wang
Han Fang
305
24
0
30 Sep 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2024
Matt Deitke
Christopher Clark
Sangho Lee
Rohun Tripathi
Yue Yang
...
Noah A. Smith
Hannaneh Hajishirzi
Ross Girshick
Ali Farhadi
Aniruddha Kembhavi
OSLM
VLM
394
58
0
25 Sep 2024
Direct Judgement Preference Optimization
Peifeng Wang
Austin Xu
Yilun Zhou
Caiming Xiong
Shafiq Joty
ELM
299
19
0
23 Sep 2024
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez
Laurent Mazaré
Manu Orsini
Amélie Royer
P. Pérez
Edouard Grave
Edouard Grave
Neil Zeghidour
AuLLM
391
327
0
17 Sep 2024
Critique-out-Loud Reward Models
Zachary Ankner
Mansheej Paul
Brandon Cui
Jonathan D. Chang
Prithviraj Ammanabrolu
ALM
LRM
247
67
0
21 Aug 2024
HMoE: Heterogeneous Mixture of Experts for Language Modeling
An Wang
Xingwu Sun
Ruobing Xie
Shuaipeng Li
Jiaqi Zhu
...
J. N. Han
Zhanhui Kang
Di Wang
Naoaki Okazaki
Cheng-zhong Xu
MoE
188
25
0
20 Aug 2024
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Pranav Putta
Edmund Mills
Naman Garg
S. Motwani
Chelsea Finn
Divyansh Garg
Rafael Rafailov
LLMAG
LRM
217
134
0
13 Aug 2024
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Transactions of the Association for Computational Linguistics (TACL), 2024
Karel DÓosterlinck
Winnie Xu
Chris Develder
Thomas Demeester
A. Singh
Christopher Potts
Douwe Kiela
Shikib Mehri
198
30
0
12 Aug 2024
Self-Taught Evaluators
Tianlu Wang
Ilia Kulikov
O. Yu. Golovneva
Ping Yu
Weizhe Yuan
Jane Dwivedi-Yu
Richard Yuanzhe Pang
Maryam Fazel-Zarandi
Jason Weston
Xian Li
ALM
LRM
173
45
0
05 Aug 2024
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Yuan Yao
Tianyu Yu
Ao Zhang
Chongyi Wang
Junbo Cui
...
Xu Han
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
VLM
MLLM
353
830
0
03 Aug 2024
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Tianhao Wu
Weizhe Yuan
O. Yu. Golovneva
Jing Xu
Yuandong Tian
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
ALM
KELM
LRM
266
147
0
28 Jul 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Zhichao Wang
Bin Bi
Shiva K. Pentyala
Kiran Ramnath
Sougata Chaudhuri
...
Z. Zhu
Xiang-Bo Mao
S. Asur
Na
Na Cheng
OffRL
231
109
0
23 Jul 2024
Understanding Reference Policies in Direct Preference Optimization
Yixin Liu
Pengfei Liu
Arman Cohan
212
16
0
18 Jul 2024
Multi-Step Reasoning with Large Language Models, a Survey
ACM Computing Surveys (ACM CSUR), 2024
Aske Plaat
Annie Wong
Suzan Verberne
Joost Broekens
Niki van Stein
Thomas Bäck
OffRL
LRM
AI4CE
ReLM
179
95
0
16 Jul 2024
1
2
Next