ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.05802
  4. Cited By
Self-critiquing models for assisting human evaluators
v1v2 (latest)

Self-critiquing models for assisting human evaluators

12 June 2022
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
    ALMELM
ArXiv (abs)PDFHTML

Papers citing "Self-critiquing models for assisting human evaluators"

50 / 260 papers shown
Title
"Are We Done Yet?": A Vision-Based Judge for Autonomous Task Completion of Computer Use Agents
"Are We Done Yet?": A Vision-Based Judge for Autonomous Task Completion of Computer Use Agents
Marta Sumyk
Oleksandr Kosovan
ELM
118
0
0
25 Nov 2025
Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
Yoonho Lee
Joseph Boen
Chelsea Finn
147
1
0
11 Nov 2025
Human-AI Complementarity: A Goal for Amplified Oversight
Human-AI Complementarity: A Goal for Amplified Oversight
Rishub Jain
Sophie Bridgers
Lili Janzer
Rory Greig
Tian Huey Teh
Vladimir Mikulik
125
2
0
30 Oct 2025
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
Zhiheng Xi
Jixuan Huang
Xin Guo
Boyang Hong
Dingwen Yang
...
Jiecao Chen
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRLLRM
162
0
0
28 Oct 2025
Agentic Meta-Orchestrator for Multi-task Copilots
Agentic Meta-Orchestrator for Multi-task Copilots
Xiaofeng Zhu
Yunshen Zhou
LLMAG
253
0
0
26 Oct 2025
Weak-to-Strong Generalization under Distribution Shifts
Weak-to-Strong Generalization under Distribution Shifts
Myeongho Jeon
Jan Sobotka
Suhwan Choi
Maria Brbić
OOD
184
0
0
24 Oct 2025
Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection
Towards Scalable Oversight with Collaborative Multi-Agent Debate in Error Detection
Yongqiang Chen
Gang Niu
James Cheng
Bo Han
Masashi Sugiyama
84
0
0
23 Oct 2025
Budget-aware Test-time Scaling via Discriminative Verification
Budget-aware Test-time Scaling via Discriminative Verification
Kyle Montgomery
Sijun Tan
Yuqi Chen
Siyuan Zhuang
Tianjun Zhang
Raluca A. Popa
Chenguang Wang
121
0
0
16 Oct 2025
AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?
AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?
Leonard Dung
Florian Mai
116
0
0
13 Oct 2025
Follow My Lead: Logical Fallacy Classification with Knowledge-Augmented LLMs
Follow My Lead: Logical Fallacy Classification with Knowledge-Augmented LLMs
Olivia Peiyu Wang
Tashvi Bansal
Ryan Bai
Emily M. Chui
Leilani H. Gilpin
LRM
86
0
0
11 Oct 2025
Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling
Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling
Xiaoyu Liu
Di Liang
Hongyu Shan
Peiyang Liu
Yonghao Liu
...
Yuntao Li
Xianjie Wu
LI Miao
Jiangrong Shen
Minlong Peng
LRM
125
2
0
29 Sep 2025
Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
Xingkai Peng
Jun Jiang
Meng Tong
Shuai Li
Weiming Zhang
Nenghai Yu
Kejiang Chen
112
0
0
21 Sep 2025
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
Mohammad Beigi
Ying Shen
Parshin Shojaee
Qifan Wang
Zichao Wang
Chandan K. Reddy
Ming Jin
Lifu Huang
LRM
82
0
0
20 Sep 2025
Unsupervised Hallucination Detection by Inspecting Reasoning Processes
Unsupervised Hallucination Detection by Inspecting Reasoning Processes
Ponhvoan Srey
Xiaobao Wu
Anh Tuan Luu
HILM
96
0
0
12 Sep 2025
PromptGuard: An Orchestrated Prompting Framework for Principled Synthetic Text Generation for Vulnerable Populations using LLMs with Enhanced Safety, Fairness, and Controllability
PromptGuard: An Orchestrated Prompting Framework for Principled Synthetic Text Generation for Vulnerable Populations using LLMs with Enhanced Safety, Fairness, and Controllability
T. Vu
Lam Nguyen
Quynh Dao
104
0
0
10 Sep 2025
Reinforcement Learning with Rubric Anchors
Reinforcement Learning with Rubric Anchors
Zenan Huang
Yihong Zhuang
Guoshan Lu
Zeyu Qin
Haokai Xu
...
Yanmei Gu
Y Samuel Wang
Zhengkai Yang
Jianguo Li
Junbo Zhao
ALM
102
18
0
18 Aug 2025
Hell or High Water: Evaluating Agentic Recovery from External Failures
Hell or High Water: Evaluating Agentic Recovery from External Failures
Andrew Wang
Sophia Hager
Adi Asija
Daniel Khashabi
Nicholas Andrews
LLMAGAIFin
125
0
0
14 Aug 2025
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
Xingwu Chen
Miao Lu
Beining Wu
Difan Zou
125
0
0
11 Aug 2025
Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction
Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction
Ruike Song
Zeen Song
Huijie Guo
Wenwen Qiang
LRM
88
0
0
06 Aug 2025
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
Wenkai Wang
Hongcan Guo
Zheqi Lv
Shengyu Zhang
88
0
0
05 Aug 2025
NPO: Learning Alignment and Meta-Alignment through Structured Human Feedback
NPO: Learning Alignment and Meta-Alignment through Structured Human Feedback
Madhava Gaikwad
Ashwini Ramchandra Doke
185
0
0
22 Jul 2025
Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning
Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning
Sam Silver
Jimin Sun
Ivan Zhang
Sara Hooker
Eddie Kim
KELMReLMLRM
158
0
0
18 Jun 2025
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
Y. Jiang
Yuwen Xiong
Yufeng Yuan
Chao Xin
Wenyuan Xu
Yu Yue
Qianchuan Zhao
Lin Yan
LRM
253
9
0
12 Jun 2025
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
Zijie Wu
Chaohui Yu
Fan Wang
Xiang Bai
AI4CE
261
4
0
11 Jun 2025
Boosting LLM Reasoning via Spontaneous Self-Correction
Boosting LLM Reasoning via Spontaneous Self-Correction
Xutong Zhao
Tengyu Xu
Xuewei Wang
Zhengxing Chen
Di Jin
...
Yun He
Sinong Wang
Han Fang
Sarath Chandar
Chen Zhu
ReLMLRMKELM
168
3
0
07 Jun 2025
ProRefine: Inference-Time Prompt Refinement with Textual Feedback
ProRefine: Inference-Time Prompt Refinement with Textual Feedback
Deepak Pandita
Tharindu Cyril Weerasooriya
A. Shah
Christopher Homan
Christopher Homan
Wei Wei
LLMAGReLMLRM
470
2
0
05 Jun 2025
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jun Rao
Zepeng Lin
Xuebo Liu
Xiaopeng Ke
Lian Lian
Dong Jin
Shengjun Cheng
Jun Yu
Min Zhang
205
6
0
04 Jun 2025
Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models
Token-level Accept or Reject: A Micro Alignment Approach for Large Language ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Y. Zhang
Yu Yu
Bo Tang
Yu Zhu
Chuxiong Sun
...
Jie Hu
Zipeng Xie
Zhiyu Li
Feiyu Xiong
Edward Chung
426
0
0
26 May 2025
Generalizable Process Reward Models via Formally Verified Training Data
Generalizable Process Reward Models via Formally Verified Training Data
Ryo Kamoi
Yusen Zhang
Nan Zhang
Sarkar Snigdha Sarathi Das
Rui Zhang
OffRLLRM
252
2
0
21 May 2025
Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier
Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier
Jianyuan Zhong
Zhiyu Li
Zhijian Xu
Xiangyu Wen
Kezhi Li
Jianyuan Zhong
LRM
153
1
0
17 May 2025
Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
Critique-Guided Distillation for Efficient and Robust Language Model Reasoning
Berkcan Kapusuzoglu
Supriyo Chakraborty
Chia-Hsuan Lee
Sambit Sahu
406
0
0
16 May 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
514
5
0
05 May 2025
DeepCritic: Deliberate Critique with Large Language Models
DeepCritic: Deliberate Critique with Large Language Models
Wenkai Yang
Jingwen Chen
Yankai Lin
Ji-Rong Wen
ALMLRM
247
8
0
01 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Qi Zhang
Tat-Seng Chua
Tianwei Zhang
ALMELM
484
21
0
26 Apr 2025
Scaling Laws For Scalable Oversight
Scaling Laws For Scalable Oversight
Joshua Engels
David D. Baek
Subhash Kantamneni
Max Tegmark
ELM
474
3
0
25 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq Joty
ELMALMLRM
412
20
0
21 Apr 2025
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization
Jing Yao
Xiaoyuan Yi
Jindong Wang
Zhicheng Dou
Xing Xie
160
4
0
09 Apr 2025
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
Tomek Korbak
Mikita Balesni
Buck Shlegeris
Geoffrey Irving
ELM
195
6
0
07 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELMLRM
253
20
0
04 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
304
12
0
04 Apr 2025
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
Shaojin Wu
Mengqi Huang
Wenxu Wu
Yufeng Cheng
Fei Ding
Qian He
DiffM
309
76
0
02 Apr 2025
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Gabriel Recchia
Chatrik Singh Mangat
Issac Li
Gayatri Krishnakumar
ALM
282
0
0
29 Mar 2025
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential InteractionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yubo Li
Yidi Miao
Xueying Ding
Ramayya Krishnan
R. Padman
473
7
0
28 Mar 2025
R2^22: A LLM Based Novel-to-Screenplay Generation Framework with Causal Plot Graphs
Zefeng Lin
Yi Xiao
Zhiqiang Mo
Qifan Zhang
Jinqiao Wang
...
Jiajing Zhang
Huatian Zhang
Zhengyi Liu
Xianyong Fang
Xiaohua Xu
164
2
0
19 Mar 2025
Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
Hanyang Zhao
Haoxian Chen
Yucheng Guo
Genta Indra Winata
Tingting Ou
Ziyu Huang
D. Yao
Wenpin Tang
478
3
0
13 Mar 2025
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
275
0
0
08 Mar 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
346
24
0
27 Feb 2025
CritiQ: Mining Data Quality Criteria from Human Preferences
CritiQ: Mining Data Quality Criteria from Human PreferencesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Honglin Guo
Kai Lv
Qipeng Guo
Tianyi Liang
Zhiheng Xi
...
Qiuyinzhe Zhang
Yizhou Sun
Kai Chen
Xipeng Qiu
Tao Gui
317
2
0
26 Feb 2025
Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt Refinement
Culture-TRIP: Culturally-Aware Text-to-Image Generation with Iterative Prompt RefinementNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Suchae Jeong
Inseong Choi
Youngsik Yun
Jihie Kim
DiffM
359
4
0
24 Feb 2025
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
Alexander Zhang
Marcus Dong
Jing Liu
Wei Zhang
Yejie Wang
...
Yancheng He
K. Deng
Wangchunshu Zhou
Wenhao Huang
Zhenru Zhang
LRM
277
11
0
23 Feb 2025
123456
Next