v1v2v3 (latest)

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

International Conference on Learning Representations (ICLR), 2023

1 August 2023

ArXiv (abs)PDF HTML HuggingFace (23 upvotes)

Papers citing "SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning"

50 / 113 papers shown

Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge

136

02 Dec 2025

Evaluation of retrieval-based QA on QUEST-LOFT

376

08 Nov 2025

Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank

28 Oct 2025

M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems

135

28 Oct 2025

PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection

507

27 Oct 2025

Verification-Aware Planning for Multi-Agent Systems

109

20 Oct 2025

Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

278

02 Oct 2025

Planning with Unified Multimodal Models

105

27 Sep 2025

Generalizability of Large Language Model-Based Agents: A Comprehensive Survey

188

19 Sep 2025

Formal Reasoning for Intelligent QA Systems: A Case Study in the Educational Domain

...

15 Sep 2025

Towards Automated Error Discovery: A Study in Conversational AI

Dominic Petrak

Thy Thy Tran

Iryna Gurevych

143

13 Sep 2025

Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference

159

10 Sep 2025

RAFFLES: Reasoning-based Attribution of Faults for LLM Systems

147

08 Sep 2025

Beyond ROUGE: N-Gram Subspace Features for LLM Hallucination Detection

Jerry Li

Evangelos Papalexakis

112

03 Sep 2025

PiCSAR: Probabilistic Confidence Selection And Ranking

212

29 Aug 2025

InfoFlood: Jailbreaking Large Language Models with Information Overload

206

13 Jun 2025

Your Agent Can Defend Itself against Backdoor Attacks

338

10 Jun 2025

Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

185

06 Jun 2025

Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety

273

05 Jun 2025

Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems

174

31 May 2025

Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration

397

30 May 2025

What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning

199

28 May 2025

Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing

200

27 May 2025

TCP: a Benchmark for Temporal Constraint-Based Planning

268

26 May 2025

YESciEval: Robust LLM-as-a-Judge for Scientific Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

459

20 May 2025

Missing vs. Unused Knowledge Hypothesis for Language Model Bottlenecks in Patent Understanding

433

18 May 2025

Retrospex: Language Agent Meets Offline Reinforcement Learning CriticConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

517

17 May 2025

LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models IntegrationInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

217

06 May 2025

Safer Prompts: Reducing Risks from Memorization in Visual Generative AI

Lena Reissinger

Yuanyuan Li

Anna-Carolina Haensch

Neeraj Sarna

197

06 May 2025

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems

...

661

30 Apr 2025

Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review

627

25 Apr 2025

Perception in Reflection

...

334

09 Apr 2025

KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models

380

25 Mar 2025

J&H: Evaluating the Robustness of Large Language Models Under Knowledge-Injection Attacks in Legal DomainAAAI Conference on Artificial Intelligence (AAAI), 2025

343

24 Mar 2025

A Survey on Mathematical Reasoning and Optimization with Large Language Models

Ali Forootani

OffRL LRM AI4CE

308

22 Mar 2025

FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs

443

21 Mar 2025

Temporal Consistency for LLM Reasoning Process Error IdentificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

257

18 Mar 2025

CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-ThoughtAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Boxuan Zhang

Ruqi Zhang

LRM

317

24 Feb 2025

A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics

742

21 Feb 2025

Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment

1.1K

05 Feb 2025

Mathematical Language Models: A Survey

...

618

03 Jan 2025

Formal Mathematical Reasoning: A New Frontier in AI

402

20 Dec 2024

Progressive Multimodal Reasoning via Active Retrieval

311

19 Dec 2024

EscapeBench: Towards Advancing Creative Intelligence of Language Model AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

...

520

18 Dec 2024

Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents

Abhishek Dharmadhikari

Atharva Marathe

R. Shah

LRM AI4CE

289

01 Dec 2024

Teaching Models to Improve on TapeAAAI Conference on Artificial Intelligence (AAAI), 2024

L. Bezalel

Eyal Orgad

Amir Globerson

285

03 Nov 2024

Plan-on-Graph: Self-Correcting Adaptive Planning of Large Language Model on Knowledge GraphsNeural Information Processing Systems (NeurIPS), 2024

274

31 Oct 2024

Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language ModelsInternational Conference on Machine Learning (ICML), 2024

Qitan Lv

Jie Wang

Hanzhu Chen

Bin Li

Yongdong Zhang

Feng Wu

HILM

342

19 Oct 2024

Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas

216

18 Oct 2024

Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings

233

17 Oct 2024