Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2308.00436
Cited By
v1
v2
v3 (latest)
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
International Conference on Learning Representations (ICLR), 2023
1 August 2023
Ning Miao
Yee Whye Teh
Tom Rainforth
ReLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (23 upvotes)
Papers citing
"SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning"
50 / 113 papers shown
Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge
Hamid Dadkhahi
Firas Trabelsi
Parker Riley
Juraj Juraska
Mehdi Mirzazadeh
LRM
136
0
0
02 Dec 2025
Evaluation of retrieval-based QA on QUEST-LOFT
Nathan Scales
Nathanael Scharli
Olivier Bousquet
RALM
376
0
0
08 Nov 2025
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
Jiayu Liu
Wei Dai
Zhenya Huang
Ning Miao
Enhong Chen
LRM
91
0
0
28 Oct 2025
M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems
Mengzhou Sun
Sendong Zhao
Jianyu Chen
Haochun Wang
Bin Qin
135
0
0
28 Oct 2025
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection
Yusu Qian
Cheng Wan
Chao Jia
Yinfei Yang
Qingyu Zhao
Zhe Gan
LRM
ReLM
507
1
0
27 Oct 2025
Verification-Aware Planning for Multi-Agent Systems
Tianyang Xu
Dan Zhang
Kushan Mitra
Estevam R. Hruschka
LLMAG
109
0
0
20 Oct 2025
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLM
LRM
AI4CE
278
0
0
02 Oct 2025
Planning with Unified Multimodal Models
Yihao Sun
Zhilong Zhang
Yang Yu
Pierre-Luc Bacon
LRM
105
0
0
27 Sep 2025
Generalizability of Large Language Model-Based Agents: A Comprehensive Survey
Minxing Zhang
Yi Yang
Roy Xie
Bhuwan Dhingra
Shuyan Zhou
Jian Pei
LLMAG
LM&Ro
AI4CE
188
3
0
19 Sep 2025
Formal Reasoning for Intelligent QA Systems: A Case Study in the Educational Domain
Tuan Bui
An X. Nguyen
Phat Thai
Minh Hua
Ngan Pham L.N.
...
Dung Le
Long Nguyen
T. Tran
Thang Bui
Tho Quan
LRM
88
1
0
15 Sep 2025
Towards Automated Error Discovery: A Study in Conversational AI
Dominic Petrak
Thy Thy Tran
Iryna Gurevych
143
0
0
13 Sep 2025
Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference
Guoqing Ma
Jia Zhu
Hanghui Guo
Weijie Shi
Jiawei Shen
Jingjiang Liu
Yidan Liang
159
1
0
10 Sep 2025
RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
Chenyang Zhu
Spencer Hong
Jingyu Wu
Kushal Chawla
Charlotte Tang
Youbing Yin
Nathan Wolfe
Erin Babinsky
Daben Liu
147
0
0
08 Sep 2025
Beyond ROUGE: N-Gram Subspace Features for LLM Hallucination Detection
Jerry Li
Evangelos Papalexakis
112
0
0
03 Sep 2025
PiCSAR: Probabilistic Confidence Selection And Ranking
Joshua Ong Jun Leang
Zheng Zhao
Aryo Pradipta Gema
Sohee Yang
Wai-Chung Kwan
Xuanli He
Wenda Li
Pasquale Minervini
Eleonora Giunchiglia
Shay B. Cohen
ReLM
BDL
LRM
212
3
0
29 Aug 2025
InfoFlood: Jailbreaking Large Language Models with Information Overload
Advait Yadav
Haibo Jin
Man Luo
Jun Zhuang
Haohan Wang
AAML
206
3
0
13 Jun 2025
Your Agent Can Defend Itself against Backdoor Attacks
Li Changjiang
Liang Jiacheng
Cao Bochuan
Chen Jinghui
Wang Ting
AAML
LLMAG
338
5
0
10 Jun 2025
Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hongming Yang
Shi Lin
Jun Shao
Changting Lin
Donghai Zhu
Meng Han
Qinglei Kong
185
2
0
06 Jun 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
Seongmin Lee
Aeree Cho
Grace C. Kim
ShengYun Peng
Mansi Phute
Duen Horng Chau
LM&MA
AI4CE
273
3
0
05 Jun 2025
Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems
Zherui Li
Yan Mi
Zhenhong Zhou
Houcheng Jiang
Guibin Zhang
Kun Wang
Junfeng Fang
LLMAG
174
3
0
31 May 2025
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
Qinglin Zhu
Runcong Zhao
Hanqi Yan
Yulan He
Yudong Chen
Lin Gui
LRM
397
0
0
30 May 2025
What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning
Gangwei Jiang
Yahui Liu
Zhaoyi Li
Qi Wang
Fuzheng Zhang
Linqi Song
Ying Wei
Defu Lian
LRM
199
7
0
28 May 2025
Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing
Raoyuan Zhao
Abdullatif Köksal
Ali Modarressi
Michael A. Hedderich
Hinrich Schutze
200
3
0
27 May 2025
TCP: a Benchmark for Temporal Constraint-Based Planning
Zifeng Ding
Sikuan Yan
Zhangdie Yuan
Xianglong Hu
Fangru Lin
Andreas Vlachos
268
3
0
26 May 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jennifer D'Souza
Hamed Babaei Giglou
Quentin Münch
ELM
459
7
0
20 May 2025
Missing vs. Unused Knowledge Hypothesis for Language Model Bottlenecks in Patent Understanding
Siyang Wu
Honglin Bao
Nadav Kunievsky
James A. Evans
433
0
0
18 May 2025
Retrospex: Language Agent Meets Offline Reinforcement Learning Critic
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Yufei Xiang
Yiqun Shen
Yeqin Zhang
Cam-Tu Nguyen
OffRL
LLMAG
KELM
LRM
517
3
0
17 May 2025
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Zirong Chen
Ziyan An
Jennifer Reynolds
Kristin Mullen
Stephen Martini
Meiyi Ma
217
1
0
06 May 2025
Safer Prompts: Reducing Risks from Memorization in Visual Generative AI
Lena Reissinger
Yuanyuan Li
Anna-Carolina Haensch
Neeraj Sarna
197
1
0
06 May 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
Jing Liu
Zhiguang Han
...
Beibin Li
Chi Wang
Hongru Wang
Yuxiao Chen
Qingyun Wu
661
39
0
30 Apr 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
627
2
0
25 Apr 2025
Perception in Reflection
Yana Wei
Liang Zhao
Kangheng Lin
En Yu
Yuang Peng
...
Jianjian Sun
Haoran Wei
Zheng Ge
Xiangyu Zhang
Vishal M. Patel
334
7
0
09 Apr 2025
KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models
Zhenting Wang
Zhongxin Liu
Ying Li
Hongyu Sun
Meng Xu
Yuqing Zhang
He Wang
Gaofei Wu
Y. Zhang
HILM
380
1
0
25 Mar 2025
J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal Domain
AAAI Conference on Artificial Intelligence (AAAI), 2025
Yiran Hu
Huanghai Liu
Qingjing Chen
Ning Zheng
C. Wang
Yun Liu
Charles L.A. Clarke
Weixing Shen
AAML
AILaw
ELM
343
5
0
24 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
308
12
0
22 Mar 2025
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
Albert Sawczyn
Jakub Binkowski
Denis Janiak
Bogdan Gabrys
Tomasz Kajdanowicz
HILM
LRM
443
4
0
21 Mar 2025
Temporal Consistency for LLM Reasoning Process Error Identification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Jiacheng Guo
Yue Wu
Jiahao Qiu
Kaixuan Huang
Xinzhe Juan
L. Yang
Mengdi Wang
LRM
257
5
0
18 Mar 2025
CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Boxuan Zhang
Ruqi Zhang
LRM
317
6
0
24 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
742
8
0
21 Feb 2025
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment
Cheryl Li
Tianyuan Xu
Yiwen Guo
LRM
1.1K
10
0
05 Feb 2025
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
618
21
0
03 Jan 2025
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRM
AI4CE
402
68
0
20 Dec 2024
Progressive Multimodal Reasoning via Active Retrieval
Guanting Dong
Chenghao Zhang
Mengjie Deng
Yinlin Zhu
Zhicheng Dou
Ji-Rong Wen
LRM
311
28
0
19 Dec 2024
EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Cheng Qian
Peixuan Han
Qinyu Luo
Bingxiang He
Xiusi Chen
...
Jiarui Yao
Xiaocheng Yang
Denghui Zhang
Yunzhu Li
Heng Ji
LLMAG
LRM
520
3
0
18 Dec 2024
Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents
Raj Jaiswal
Dhruv Jain
Harsh Parimal Popat
Avinash Anand
Abhishek Dharmadhikari
Atharva Marathe
R. Shah
LRM
AI4CE
289
11
0
01 Dec 2024
Teaching Models to Improve on Tape
AAAI Conference on Artificial Intelligence (AAAI), 2024
L. Bezalel
Eyal Orgad
Amir Globerson
285
0
0
03 Nov 2024
Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge Graphs
Neural Information Processing Systems (NeurIPS), 2024
L. Chen
Panrong Tong
Zhongming Jin
Ying Sun
Jieping Ye
Hui Xiong
KELM
RALM
LRM
274
74
0
31 Oct 2024
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
International Conference on Machine Learning (ICML), 2024
Qitan Lv
Jie Wang
Hanzhu Chen
Bin Li
Yongdong Zhang
Feng Wu
HILM
342
11
0
19 Oct 2024
Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas
Xiang Hu
Hongyu Fu
Jinge Wang
Yifeng Wang
Zhikun Li
Renjun Xu
Yu Lu
Yaochu Jin
Lili Pan
Zhenzhong Lan
LRM
216
37
0
18 Oct 2024
Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings
Krishno Dey
Prerona Tarannum
Md. Arid Hasan
Imran Razzak
Usman Naseem
233
13
0
17 Oct 2024
1
2
3
Next