ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.00436
  4. Cited By
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step
  Reasoning
v1v2v3 (latest)

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

International Conference on Learning Representations (ICLR), 2023
1 August 2023
Ning Miao
Yee Whye Teh
Tom Rainforth
    ReLMLRM
ArXiv (abs)PDFHTMLHuggingFace (23 upvotes)

Papers citing "SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning"

50 / 113 papers shown
Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge
Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge
Hamid Dadkhahi
Firas Trabelsi
Parker Riley
Juraj Juraska
Mehdi Mirzazadeh
LRM
136
0
0
02 Dec 2025
Evaluation of retrieval-based QA on QUEST-LOFT
Evaluation of retrieval-based QA on QUEST-LOFT
Nathan Scales
Nathanael Scharli
Olivier Bousquet
RALM
376
0
0
08 Nov 2025
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
Jiayu Liu
Wei Dai
Zhenya Huang
Ning Miao
Enhong Chen
LRM
91
0
0
28 Oct 2025
M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems
M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems
Mengzhou Sun
Sendong Zhao
Jianyu Chen
Haochun Wang
Bin Qin
135
0
0
28 Oct 2025
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection
Yusu Qian
Cheng Wan
Chao Jia
Yinfei Yang
Qingyu Zhao
Zhe Gan
LRMReLM
507
1
0
27 Oct 2025
Verification-Aware Planning for Multi-Agent Systems
Verification-Aware Planning for Multi-Agent Systems
Tianyang Xu
Dan Zhang
Kushan Mitra
Estevam R. Hruschka
LLMAG
109
0
0
20 Oct 2025
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLMLRMAI4CE
278
0
0
02 Oct 2025
Planning with Unified Multimodal Models
Planning with Unified Multimodal Models
Yihao Sun
Zhilong Zhang
Yang Yu
Pierre-Luc Bacon
LRM
105
0
0
27 Sep 2025
Generalizability of Large Language Model-Based Agents: A Comprehensive Survey
Generalizability of Large Language Model-Based Agents: A Comprehensive Survey
Minxing Zhang
Yi Yang
Roy Xie
Bhuwan Dhingra
Shuyan Zhou
Jian Pei
LLMAGLM&RoAI4CE
188
3
0
19 Sep 2025
Formal Reasoning for Intelligent QA Systems: A Case Study in the Educational Domain
Formal Reasoning for Intelligent QA Systems: A Case Study in the Educational Domain
Tuan Bui
An X. Nguyen
Phat Thai
Minh Hua
Ngan Pham L.N.
...
Dung Le
Long Nguyen
T. Tran
Thang Bui
Tho Quan
LRM
88
1
0
15 Sep 2025
Towards Automated Error Discovery: A Study in Conversational AI
Towards Automated Error Discovery: A Study in Conversational AI
Dominic Petrak
Thy Thy Tran
Iryna Gurevych
143
0
0
13 Sep 2025
Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Guoqing Ma
Jia Zhu
Hanghui Guo
Weijie Shi
Jiawei Shen
Jingjiang Liu
Yidan Liang
159
1
0
10 Sep 2025
RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
Chenyang Zhu
Spencer Hong
Jingyu Wu
Kushal Chawla
Charlotte Tang
Youbing Yin
Nathan Wolfe
Erin Babinsky
Daben Liu
147
0
0
08 Sep 2025
Beyond ROUGE: N-Gram Subspace Features for LLM Hallucination Detection
Beyond ROUGE: N-Gram Subspace Features for LLM Hallucination Detection
Jerry Li
Evangelos Papalexakis
112
0
0
03 Sep 2025
PiCSAR: Probabilistic Confidence Selection And Ranking
PiCSAR: Probabilistic Confidence Selection And Ranking
Joshua Ong Jun Leang
Zheng Zhao
Aryo Pradipta Gema
Sohee Yang
Wai-Chung Kwan
Xuanli He
Wenda Li
Pasquale Minervini
Eleonora Giunchiglia
Shay B. Cohen
ReLMBDLLRM
212
3
0
29 Aug 2025
InfoFlood: Jailbreaking Large Language Models with Information Overload
InfoFlood: Jailbreaking Large Language Models with Information Overload
Advait Yadav
Haibo Jin
Man Luo
Jun Zhuang
Haohan Wang
AAML
206
3
0
13 Jun 2025
Your Agent Can Defend Itself against Backdoor Attacks
Your Agent Can Defend Itself against Backdoor Attacks
Li Changjiang
Liang Jiacheng
Cao Bochuan
Chen Jinghui
Wang Ting
AAMLLLMAG
338
5
0
10 Jun 2025
Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Hongming Yang
Shi Lin
Jun Shao
Changting Lin
Donghai Zhu
Meng Han
Qinglei Kong
185
2
0
06 Jun 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MAAI4CE
273
3
0
05 Jun 2025
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems
Zherui Li
Yan Mi
Zhenhong Zhou
Houcheng Jiang
Guibin Zhang
Kun Wang
Junfeng Fang
LLMAG
174
3
0
31 May 2025
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
Qinglin Zhu
Runcong Zhao
Hanqi Yan
Yulan He
Yudong Chen
Lin Gui
LRM
397
0
0
30 May 2025
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning
Gangwei Jiang
Yahui Liu
Zhaoyi Li
Qi Wang
Fuzheng Zhang
Linqi Song
Ying Wei
Defu Lian
LRM
199
7
0
28 May 2025
Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing
Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing
Raoyuan Zhao
Abdullatif Köksal
Ali Modarressi
Michael A. Hedderich
Hinrich Schutze
200
3
0
27 May 2025
TCP: a Benchmark for Temporal Constraint-Based Planning
TCP: a Benchmark for Temporal Constraint-Based Planning
Zifeng Ding
Sikuan Yan
Zhangdie Yuan
Xianglong Hu
Fangru Lin
Andreas Vlachos
268
3
0
26 May 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
YESciEval: Robust LLM-as-a-Judge for Scientific Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jennifer D'Souza
Hamed Babaei Giglou
Quentin Münch
ELM
459
7
0
20 May 2025
Missing vs. Unused Knowledge Hypothesis for Language Model Bottlenecks in Patent Understanding
Missing vs. Unused Knowledge Hypothesis for Language Model Bottlenecks in Patent Understanding
Siyang Wu
Honglin Bao
Nadav Kunievsky
James A. Evans
433
0
0
18 May 2025
Retrospex: Language Agent Meets Offline Reinforcement Learning Critic
Retrospex: Language Agent Meets Offline Reinforcement Learning CriticConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Yufei Xiang
Yiqun Shen
Yeqin Zhang
Cam-Tu Nguyen
OffRLLLMAGKELMLRM
517
3
0
17 May 2025
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models IntegrationInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Zirong Chen
Ziyan An
Jennifer Reynolds
Kristin Mullen
Stephen Martini
Meiyi Ma
217
1
0
06 May 2025
Safer Prompts: Reducing Risks from Memorization in Visual Generative AI
Safer Prompts: Reducing Risks from Memorization in Visual Generative AI
Lena Reissinger
Yuanyuan Li
Anna-Carolina Haensch
Neeraj Sarna
197
1
0
06 May 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
Jing Liu
Zhiguang Han
...
Beibin Li
Chi Wang
Hongru Wang
Yuxiao Chen
Qingyun Wu
661
39
0
30 Apr 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
627
2
0
25 Apr 2025
Perception in Reflection
Perception in Reflection
Yana Wei
Liang Zhao
Kangheng Lin
En Yu
Yuang Peng
...
Jianjian Sun
Haoran Wei
Zheng Ge
Xiangyu Zhang
Vishal M. Patel
334
7
0
09 Apr 2025
KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models
KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models
Zhenting Wang
Zhongxin Liu
Ying Li
Hongyu Sun
Meng Xu
Yuqing Zhang
He Wang
Gaofei Wu
Y. Zhang
HILM
380
1
0
25 Mar 2025
J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal Domain
J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal DomainAAAI Conference on Artificial Intelligence (AAAI), 2025
Yiran Hu
Huanghai Liu
Qingjing Chen
Ning Zheng
C. Wang
Yun Liu
Charles L.A. Clarke
Weixing Shen
AAMLAILawELM
343
5
0
24 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRLLRMAI4CE
308
12
0
22 Mar 2025
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
Albert Sawczyn
Jakub Binkowski
Denis Janiak
Bogdan Gabrys
Tomasz Kajdanowicz
HILMLRM
443
4
0
21 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Temporal Consistency for LLM Reasoning Process Error IdentificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
257
5
0
18 Mar 2025
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-ThoughtAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Boxuan Zhang
Ruqi Zhang
LRM
317
6
0
24 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRMAI4CEReLMKELM
742
8
0
21 Feb 2025
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment
Cheryl Li
Tianyuan Xu
Yiwen Guo
LRM
1.1K
10
0
05 Feb 2025
Mathematical Language Models: A Survey
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
618
21
0
03 Jan 2025
Formal Mathematical Reasoning: A New Frontier in AI
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRMAI4CE
402
68
0
20 Dec 2024
Progressive Multimodal Reasoning via Active Retrieval
Progressive Multimodal Reasoning via Active Retrieval
Guanting Dong
Chenghao Zhang
Mengjie Deng
Yinlin Zhu
Zhicheng Dou
Ji-Rong Wen
LRM
311
28
0
19 Dec 2024
EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents
EscapeBench: Towards Advancing Creative Intelligence of Language Model AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Cheng Qian
Peixuan Han
Qinyu Luo
Bingxiang He
Xiusi Chen
...
Jiarui Yao
Xiaocheng Yang
Denghui Zhang
Yunzhu Li
Heng Ji
LLMAGLRM
520
3
0
18 Dec 2024
Improving Physics Reasoning in Large Language Models Using Mixture of
  Refinement Agents
Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents
Raj Jaiswal
Dhruv Jain
Harsh Parimal Popat
Avinash Anand
Abhishek Dharmadhikari
Atharva Marathe
R. Shah
LRMAI4CE
289
11
0
01 Dec 2024
Teaching Models to Improve on Tape
Teaching Models to Improve on TapeAAAI Conference on Artificial Intelligence (AAAI), 2024
L. Bezalel
Eyal Orgad
Amir Globerson
285
0
0
03 Nov 2024
Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model
  on Knowledge Graphs
Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge GraphsNeural Information Processing Systems (NeurIPS), 2024
L. Chen
Panrong Tong
Zhongming Jin
Ying Sun
Jieping Ye
Hui Xiong
KELMRALMLRM
274
74
0
31 Oct 2024
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large
  Language Models
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language ModelsInternational Conference on Machine Learning (ICML), 2024
Qitan Lv
Jie Wang
Hanzhu Chen
Bin Li
Yongdong Zhang
Feng Wu
HILM
342
11
0
19 Oct 2024
Nova: An Iterative Planning and Search Approach to Enhance Novelty and
  Diversity of LLM Generated Ideas
Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas
Xiang Hu
Hongyu Fu
Jinge Wang
Yifeng Wang
Zhikun Li
Renjun Xu
Yu Lu
Yaochu Jin
Lili Pan
Zhenzhong Lan
LRM
216
37
0
18 Oct 2024
Better to Ask in English: Evaluation of Large Language Models on
  English, Low-resource and Cross-Lingual Settings
Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings
Krishno Dey
Prerona Tarannum
Md. Arid Hasan
Imran Razzak
Usman Naseem
233
13
0
17 Oct 2024
123
Next