ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13281
  4. Cited By
LM vs LM: Detecting Factual Errors via Cross Examination

LM vs LM: Detecting Factual Errors via Cross Examination

22 May 2023
Roi Cohen
May Hamri
Mor Geva
Amir Globerson
    HILM
ArXivPDFHTML

Papers citing "LM vs LM: Detecting Factual Errors via Cross Examination"

50 / 100 papers shown
Title
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
60
0
0
05 May 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
79
0
0
25 Apr 2025
A Library of LLM Intrinsics for Retrieval-Augmented Generation
A Library of LLM Intrinsics for Retrieval-Augmented Generation
Marina Danilevsky
Kristjan Greenewald
Chulaka Gunasekara
Maeda Hanafi
Lihong He
...
Frederick Reiss
Vraj Shah
Khoi-Nguyen Tran
Huaiyu Zhu
Luis A. Lastras
26
1
0
16 Apr 2025
HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification
HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification
Bibek Paudel
Alexander Lyzhov
Preetam Joshi
Puneet Anand
HILM
43
0
0
09 Apr 2025
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Gleb Rodionov
Roman Garipov
Alina Shutova
George Yakushev
Vage Egiazarian
Anton Sinitsin
Denis Kuznedelev
Dan Alistarh
LRM
27
1
0
08 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
43
1
0
01 Apr 2025
$\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks
Agents Under Siege\textit{Agents Under Siege}Agents Under Siege: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks
Rana Muhammad Shahroz Khan
Zhen Tan
Sukwon Yun
Charles Flemming
Tianlong Chen
AAML
LLMAG
Presented at ResearchTrend Connect | LLMAG on 23 Apr 2025
89
2
0
31 Mar 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
46
4
0
13 Mar 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
44
4
0
27 Feb 2025
LettuceDetect: A Hallucination Detection Framework for RAG Applications
LettuceDetect: A Hallucination Detection Framework for RAG Applications
Adam Kovacs
Gábor Recski
35
2
0
24 Feb 2025
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
S^3cMath: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners
Yuchen Yan
Jin Jiang
Yang Liu
Yixin Cao
Xin Xu
M. Zhang
Xunliang Cai
Jian Shao
ReLM
LRM
KELM
110
7
0
21 Feb 2025
Hallucination Detection in Large Language Models with Metamorphic Relations
Hallucination Detection in Large Language Models with Metamorphic Relations
Borui Yang
Md Afif Al Mamun
Jie M. Zhang
Gias Uddin
HILM
59
0
0
20 Feb 2025
PSSD: Making Large Language Models Self-denial via Human Psyche Structure
PSSD: Making Large Language Models Self-denial via Human Psyche Structure
Jinzhi Liao
Zenghua Liao
Xiang Zhao
LRM
LLMAG
43
0
0
03 Feb 2025
CALM: Curiosity-Driven Auditing for Large Language Models
Xiang Zheng
Longxiang Wang
Yi Liu
Xingjun Ma
Chao Shen
Cong Wang
MLAU
47
0
0
06 Jan 2025
Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative
  Querying
Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying
Federico Castagna
I. Sassoon
Simon Parsons
LRM
85
0
0
19 Dec 2024
The Superalignment of Superhuman Intelligence with Large Language Models
The Superalignment of Superhuman Intelligence with Large Language Models
Minlie Huang
Yingkang Wang
Shiyao Cui
Pei Ke
J. Tang
103
1
0
15 Dec 2024
Label-Confidence-Aware Uncertainty Estimation in Natural Language
  Generation
Label-Confidence-Aware Uncertainty Estimation in Natural Language Generation
Qinhong Lin
Linna Zhou
Zhongliang Yang
Yuang Cai
HILM
75
0
0
10 Dec 2024
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Roi Cohen
Konstantin Dobler
Eden Biran
Gerard de Melo
81
3
0
09 Dec 2024
A Large-Scale Study of Relevance Assessments with Large Language Models:
  An Initial Look
A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look
Shivani Upadhyay
Ronak Pradeep
Nandan Thakur
Daniel Fernando Campos
Nick Craswell
I. Soboroff
Hoa Trang Dang
Jimmy J. Lin
23
16
0
13 Nov 2024
A Theoretical Survey on Foundation Models
A Theoretical Survey on Foundation Models
Shi Fu
Yuzhu Chen
Yingjie Wang
Dacheng Tao
18
0
0
15 Oct 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
ZhongXiang Sun
Xiaoxue Zang
Kai Zheng
Yang Song
Jun Xu
Xiao Zhang
Weijie Yu
Yang Song
Han Li
44
6
0
15 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
33
8
0
09 Oct 2024
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination
  Detection
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
Xuefeng Du
Chaowei Xiao
Yixuan Li
HILM
24
16
0
26 Sep 2024
A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource
  Hallucination Detection in Large Language Models
A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource Hallucination Detection in Large Language Models
Satoshi Munakata
Taku Fukui
Takao Mohri
11
0
0
20 Sep 2024
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling
Xinyue Fang
Zhen Huang
Zhiliang Tian
Minghui Fang
Ziyi Pan
Quntian Fang
Zhihua Wen
Hengyue Pan
Dongsheng Li
HILM
86
2
0
17 Sep 2024
Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned
  Models
Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models
Gabriel Y. Arteaga
Thomas B. Schon
Nicolas Pielawski
20
1
0
04 Sep 2024
Internal Consistency and Self-Feedback in Large Language Models: A
  Survey
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Xun Liang
Shichao Song
Zifan Zheng
Hanyu Wang
Qingchen Yu
...
Rong-Hua Li
Peng Cheng
Zhonghao Wang
Feiyu Xiong
Zhiyu Li
HILM
LRM
56
24
0
19 Jul 2024
Estimating Knowledge in Large Language Models Without Generating a
  Single Token
Estimating Knowledge in Large Language Models Without Generating a Single Token
Daniela Gottesman
Mor Geva
34
10
0
18 Jun 2024
Small Agent Can Also Rock! Empowering Small Language Models as
  Hallucination Detector
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector
Xiaoxue Cheng
Junyi Li
Wayne Xin Zhao
Hongzhi Zhang
Fuzheng Zhang
Di Zhang
Kun Gai
Ji-Rong Wen
HILM
LLMAG
27
7
0
17 Jun 2024
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs
D. Yaldiz
Yavuz Faruk Bakman
Baturalp Buyukates
Chenyang Tao
Anil Ramakrishna
Dimitrios Dimitriadis
Jieyu Zhao
Salman Avestimehr
32
1
0
17 Jun 2024
Teaching Large Language Models to Express Knowledge Boundary from Their
  Own Signals
Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals
Lida Chen
Zujie Liang
Xintao Wang
Jiaqing Liang
Yanghua Xiao
Feng Wei
Jinglei Chen
Zhenghong Hao
Bing Han
Wei Wang
37
7
0
16 Jun 2024
Multi-Agent Software Development through Cross-Team Collaboration
Multi-Agent Software Development through Cross-Team Collaboration
Zhuoyun Du
Chen Qian
Wei Liu
Zihao Xie
Yifei Wang
Yufan Dang
Weize Chen
Cheng Yang
LLMAG
33
16
0
13 Jun 2024
Scaling Large Language Model-based Multi-Agent Collaboration
Scaling Large Language Model-based Multi-Agent Collaboration
Chen Qian
Zihao Xie
YiFei Wang
Wei Liu
Yufan Dang
...
Zhuoyun Du
Weize Chen
Cheng Yang
Zhiyuan Liu
Maosong Sun
AI4CE
LLMAG
LM&Ro
54
42
0
11 Jun 2024
Semantically Diverse Language Generation for Uncertainty Estimation in
  Language Models
Semantically Diverse Language Generation for Uncertainty Estimation in Language Models
L. Aichberger
Kajetan Schweighofer
Mykyta Ielanskyi
Sepp Hochreiter
HILM
21
3
0
06 Jun 2024
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of
  Self-Correction of LLMs
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi
Yusen Zhang
Nan Zhang
Jiawei Han
Rui Zhang
LRM
40
19
0
03 Jun 2024
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles
  and Committee Discussions
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions
Ruochen Zhao
Wenxuan Zhang
Yew Ken Chia
Deli Zhao
Lidong Bing
27
9
0
30 May 2024
Kernel Language Entropy: Fine-grained Uncertainty Quantification for
  LLMs from Semantic Similarities
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities
Alexander Nikitin
Jannik Kossen
Yarin Gal
Pekka Marttinen
UQCV
42
23
0
30 May 2024
One vs. Many: Comprehending Accurate Information from Multiple Erroneous
  and Inconsistent AI Generations
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations
Yoonjoo Lee
Kihoon Son
Tae Soo Kim
Jisu Kim
John Joon Young Chung
Eytan Adar
Juho Kim
28
11
0
09 May 2024
Iterative Experience Refinement of Software-Developing Agents
Iterative Experience Refinement of Software-Developing Agents
Cheng Qian
Jiahao Li
Yufan Dang
Wei Liu
Yifei Wang
...
Weize Chen
Cheng Yang
Yingli Zhang
Zhiyuan Liu
Maosong Sun
LLMAG
23
12
0
07 May 2024
WitheredLeaf: Finding Entity-Inconsistency Bugs with LLMs
WitheredLeaf: Finding Entity-Inconsistency Bugs with LLMs
Hongbo Chen
Yifan Zhang
Xing Han
Huanyao Rong
Yuheng Zhang
Tianhao Mao
Hang Zhang
XiaoFeng Wang
Luyi Xing
Xun Chen
22
2
0
02 May 2024
When to Trust LLMs: Aligning Confidence with Response Quality
When to Trust LLMs: Aligning Confidence with Response Quality
Shuchang Tao
Liuyi Yao
Hanxing Ding
Yuexiang Xie
Qi Cao
Fei Sun
Jinyang Gao
Huawei Shen
Bolin Ding
24
15
0
26 Apr 2024
Transferable and Efficient Non-Factual Content Detection via Probe
  Training with Offline Consistency Checking
Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking
Xiaokang Zhang
Zijun Yao
Jing Zhang
Kaifeng Yun
Jifan Yu
Juan-Zi Li
Jie Tang
HILM
19
2
0
10 Apr 2024
The Hallucinations Leaderboard -- An Open Effort to Measure
  Hallucinations in Large Language Models
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models
Giwon Hong
Aryo Pradipta Gema
Rohit Saxena
Xiaotang Du
Ping Nie
...
Laura Perez-Beltrachini
Max Ryabinin
Xuanli He
Clémentine Fourrier
Pasquale Minervini
LRM
HILM
26
9
0
08 Apr 2024
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State
  Transition Dynamics
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
Derui Zhu
Dingfan Chen
Qing Li
Zongxiong Chen
Lei Ma
Jens Grossklags
Mario Fritz
HILM
32
3
0
06 Apr 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
40
19
0
04 Apr 2024
Concept -- An Evaluation Protocol on Conversational Recommender Systems
  with System-centric and User-centric Factors
Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
Chen Huang
Peixin Qin
Yang Deng
Wenqiang Lei
Jiancheng Lv
Tat-Seng Chua
22
6
0
04 Apr 2024
Mechanistic Understanding and Mitigation of Language Model Non-Factual
  Hallucinations
Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations
Lei Yu
Meng Cao
Jackie Chi Kit Cheung
Yue Dong
HILM
27
6
0
27 Mar 2024
Learning to Use Tools via Cooperative and Interactive Agents
Learning to Use Tools via Cooperative and Interactive Agents
Zhengliang Shi
Shen Gao
Xiuyi Chen
Zhumin Chen
Lingyong Yan
Haibo Shi
Dawei Yin
Pengjie Ren
Suzan Verberne
Zhaochun Ren
LLMAG
21
16
0
05 Mar 2024
A Survey of AI-generated Text Forensic Systems: Detection, Attribution,
  and Characterization
A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
Tharindu Kumarage
Garima Agrawal
Paras Sheth
Raha Moraffah
Amanat Chadha
Joshua Garland
Huan Liu
DeLMO
26
6
0
02 Mar 2024
Navigating Complexity: Orchestrated Problem Solving with Multi-Agent
  LLMs
Navigating Complexity: Orchestrated Problem Solving with Multi-Agent LLMs
Sumedh Rasal
E. Hauer
17
0
0
26 Feb 2024
12
Next