ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17225
  4. Cited By
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

22 May 2025
Doohyuk Jang
Yoonjeon Kim
Chanjae Park
Hyun Ryu
Eunho Yang
    LRM
ArXiv (abs)PDFHTML

Papers citing "Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models"

28 / 28 papers shown
Title
RM-R1: Reward Modeling as Reasoning
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLMOffRLLRM
389
21
0
05 May 2025
Process Reward Models That Think
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRLALMLRM
143
9
0
23 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLMLRM
229
128
0
18 Apr 2025
Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning?
Do Chains-of-Thoughts of Large Language Models Suffer from Hallucinations, Cognitive Biases, or Phobias in Bayesian Reasoning?
Roberto Araya
LRM
90
2
0
19 Mar 2025
Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning
Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning
Jonathan Kim
Anna Podlasek
Kie Shidara
Feng Liu
Ahmed Alaa
Danilo Bernardo
ELMLRMAI4MH
88
5
0
05 Feb 2025
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen
Guangtao Zeng
Zhenting Qi
Zhang-Wei Hong
Zhenfang Chen
Wei Lu
G. Wornell
Subhro Das
David D. Cox
Chuang Gan
LRMLLMAG
555
18
0
04 Feb 2025
Process Reinforcement through Implicit Rewards
Process Reinforcement through Implicit Rewards
Ganqu Cui
Lifan Yuan
Ziyi Wang
Hanbin Wang
Wendi Li
...
Yu Cheng
Zhiyuan Liu
Maosong Sun
Bowen Zhou
Ning Ding
OffRLLRM
186
103
0
03 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
384
2,022
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRLALMAI4TSVLMLRM
346
338
0
22 Jan 2025
o1-Coder: an o1 Replication for Coding
Yuxiang Zhang
Shangxi Wu
Yuqi Yang
Jiangming Shu
Jinlin Xiao
Chao Kong
Jitao Sang
LRM
169
51
0
29 Nov 2024
Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of
  Thought Reasoning
Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning
Kyle Moore
Jesse Roberts
Thao Pham
Douglas H. Fisher
LRM
51
4
0
16 Aug 2024
Easy Problems That LLMs Get Wrong
Easy Problems That LLMs Get Wrong
Sean Williams
James Huckle
LRM
158
14
0
30 May 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
148
448
0
12 Mar 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
186
1,288
0
05 Feb 2024
Llemma: An Open Language Model For Mathematics
Llemma: An Open Language Model For Mathematics
Zhangir Azerbayev
Hailey Schoelkopf
Keiran Paster
Marco Dos Santos
Stephen Marcus McAleer
Albert Q. Jiang
Jia Deng
Stella Biderman
Sean Welleck
CLL
123
303
0
16 Oct 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
480
12,124
0
18 Jul 2023
Math Word Problem Solving by Generating Linguistic Variants of Problem
  Statements
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements
Syed Rifat Raiyan
Md. Nafis Faiyaz
S. Kabir
Mohsinul Kabir
H. Mahmud
Md. Kamrul Hasan
76
14
0
24 Jun 2023
Large Language Models are Fixated by Red Herrings: Exploring Creative
  Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset
S. Naeini
Raeid Saqur
M. Saeidi
John Giorgi
Babak Taati
115
11
0
19 Jun 2023
MathPrompter: Mathematical Reasoning using Large Language Models
MathPrompter: Mathematical Reasoning using Large Language Models
Shima Imani
Liang Du
H. Shrivastava
KELMReLMLRM
104
214
0
04 Mar 2023
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
557
6,315
0
05 Apr 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
940
9,784
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
408
4,606
0
27 Oct 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
224
2,413
0
05 Mar 2021
LogiQA: A Challenge Dataset for Machine Reading Comprehension with
  Logical Reasoning
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning
Jian Liu
Leyang Cui
Hanmeng Liu
Dandan Huang
Yile Wang
Yue Zhang
RALM
132
382
0
16 Jul 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.0K
42,651
0
28 May 2020
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Weihao Yu
Zihang Jiang
Yanfei Dong
Jiashi Feng
LRM
160
255
0
11 Feb 2020
CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text
CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text
Koustuv Sinha
Shagun Sodhani
Jin Dong
Joelle Pineau
William L. Hamilton
91
211
0
16 Aug 2019
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
655
19,343
0
20 Jul 2017
1