ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13160
  4. Cited By
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via
  Debate

Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

22 May 2023
Boshi Wang
Xiang Yue
Huan Sun
    ELM
    LRM
ArXivPDFHTML

Papers citing "Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate"

50 / 53 papers shown
Title
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
Md Fahim Anjum
LRM
25
0
0
30 Apr 2025
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
Y. Li
Qizhi Pei
Mengyuan Sun
Honglin Lin
Chenlin Ming
Xin Gao
Jiang Wu
C. He
Lijun Wu
ELM
LRM
40
0
0
27 Apr 2025
Don't Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs
Don't Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs
Pengkun Jiao
Bin Zhu
Jingjing Chen
Chong-Wah Ngo
Yu Jiang
31
0
0
13 Apr 2025
Rethinking Reflection in Pre-Training
Rethinking Reflection in Pre-Training
Essential AI
Darsh J Shah
Peter Rushton
Somanshu Singla
Mohit Parmar
...
Philip Monk
Platon Mazarakis
Ritvik Kapila
Saurabh Srivastava
Tim Romanski
ReLM
LRM
43
3
0
05 Apr 2025
Envisioning an AI-Enhanced Mental Health Ecosystem
Envisioning an AI-Enhanced Mental Health Ecosystem
Kellie Yu Hui Sim
K. T. W. Choo
AI4MH
52
0
0
19 Mar 2025
Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes
Integrating Chain-of-Thought and Retrieval Augmented Generation Enhances Rare Disease Diagnosis from Clinical Notes
Da Wu
Zhanliang Wang
Quan Nguyen
Kai Wang
76
1
0
15 Mar 2025
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Bin Zhu
Hui yan Qi
Yinxuan Gui
Jingjing Chen
Chong-Wah Ngo
Ee-Peng Lim
77
1
0
31 Jan 2025
Tailored Truths: Optimizing LLM Persuasion with Personalization and Fabricated Statistics
Tailored Truths: Optimizing LLM Persuasion with Personalization and Fabricated Statistics
Jasper Timm
Chetan Talele
Jacob Haimes
33
0
0
28 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
108
61
0
25 Nov 2024
Do LLMs write like humans? Variation in grammatical and rhetorical
  styles
Do LLMs write like humans? Variation in grammatical and rhetorical styles
Alex Reinhart
David West Brown
Ben Markey
Michael Laudenbach
Kachatad Pantusen
Ronald Yurko
Gordon Weinberg
DeLMO
31
5
0
21 Oct 2024
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving
  Robust Reasoning in Large Language Models via Abstraction
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction
Kaiqiao Han
Tianqing Fang
Zhaowei Wang
Y. Song
Mark Steedman
LRM
24
0
0
15 Oct 2024
A Survey on the Honesty of Large Language Models
A Survey on the Honesty of Large Language Models
Siheng Li
Cheng Yang
Taiqiang Wu
Chufan Shi
Yuji Zhang
...
Jie Zhou
Yujiu Yang
Ngai Wong
Xixin Wu
Wai Lam
HILM
27
4
0
27 Sep 2024
Setting the AI Agenda -- Evidence from Sweden in the ChatGPT Era
Setting the AI Agenda -- Evidence from Sweden in the ChatGPT Era
Bastiaan Bruinsma
Annika Fredén
Kajsa Hansson
Moa Johansson
Pasko Kisić-Merino
Denitsa Saynova
31
0
0
25 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
34
1
0
19 Sep 2024
Pairing Analogy-Augmented Generation with Procedural Memory for
  Procedural Q&A
Pairing Analogy-Augmented Generation with Procedural Memory for Procedural Q&A
K Roth
Rushil Gupta
Simon Halle
Bang Liu
RALM
25
0
0
02 Sep 2024
Fostering Natural Conversation in Large Language Models with NICO: a
  Natural Interactive COnversation dataset
Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset
Renliang Sun
Mengyuan Liu
Shiping Yang
Rui Wang
Junqing He
Jiaxing Zhang
25
2
0
18 Aug 2024
On scalable oversight with weak LLMs judging strong LLMs
On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton
Noah Y. Siegel
János Kramár
Jonah Brown-Cohen
Samuel Albanie
...
Rishabh Agarwal
David Lindner
Yunhao Tang
Noah D. Goodman
Rohin Shah
ELM
32
28
0
05 Jul 2024
Belief Revision: The Adaptability of Large Language Models Reasoning
Belief Revision: The Adaptability of Large Language Models Reasoning
Bryan Wilie
Samuel Cahyawijaya
Etsuko Ishii
Junxian He
Pascale Fung
KELM
LRM
34
1
0
28 Jun 2024
Natural Language but Omitted? On the Ineffectiveness of Large Language
  Models' privacy policy from End-users' Perspective
Natural Language but Omitted? On the Ineffectiveness of Large Language Models' privacy policy from End-users' Perspective
Shuning Zhang
Haobin Xing
Xin Yi
Hewu Li
PILM
29
0
0
26 Jun 2024
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Orevaoghene Ahia
Shuyue Stella Li
Vidhisha Balachandran
Sunayana Sitaram
Yulia Tsvetkov
62
4
0
22 Jun 2024
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness
  Evaluation in Large Language Models
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models
Yuqing Wang
Yun Zhao
LRM
AAML
ELM
27
1
0
16 Jun 2024
Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs
  for Open-Ended Responses
Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses
Maryam Amirizaniani
Elias Martin
Maryna Sivachenko
A. Mashhadi
Chirag Shah
LRM
29
11
0
09 Jun 2024
Are LLMs classical or nonmonotonic reasoners? Lessons from generics
Are LLMs classical or nonmonotonic reasoners? Lessons from generics
Alina Leidinger
R. Rooij
Ekaterina Shutova
LRM
26
3
0
05 Jun 2024
Models That Prove Their Own Correctness
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
34
2
0
24 May 2024
A Survey on the Real Power of ChatGPT
A Survey on the Real Power of ChatGPT
Ming Liu
Ran Liu
Ye Zhu
Hua Wang
Youyang Qu
Rongsheng Li
Yongpan Sheng
Wray L. Buntine
34
2
0
22 Apr 2024
CAUS: A Dataset for Question Generation based on Human Cognition
  Leveraging Large Language Models
CAUS: A Dataset for Question Generation based on Human Cognition Leveraging Large Language Models
Minjung Shin
Donghyun Kim
Jeh-Kwang Ryu
LRM
14
1
0
18 Apr 2024
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language
  Models -- A Survey
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Philipp Mondorf
Barbara Plank
ELM
LRM
LM&MA
28
34
0
02 Apr 2024
When is Tree Search Useful for LLM Planning? It Depends on the
  Discriminator
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator
Ziru Chen
Michael White
Raymond Mooney
Ali Payani
Yu-Chuan Su
Huan Sun
LLMAG
75
33
0
16 Feb 2024
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate
  Controllable Controversial Statements
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
Ming Li
Jiuhai Chen
Lichang Chen
Tianyi Zhou
66
17
0
16 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language
  Agents
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
36
14
0
15 Feb 2024
Antagonistic AI
Antagonistic AI
Alice Cai
Ian Arawjo
Elena L. Glassman
16
3
0
12 Feb 2024
REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records
  Analysis via Large Language Models
REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models
Yinghao Zhu
Changyu Ren
Shiyun Xie
Shukai Liu
Hangyuan Ji
...
Tao Sun
Long He
Zhoujun Li
Xi Zhu
Chengwei Pan
45
18
0
10 Feb 2024
Understanding the Weakness of Large Language Model Agents within a
  Complex Android Environment
Understanding the Weakness of Large Language Model Agents within a Complex Android Environment
Mingzhe Xing
Rongkai Zhang
Hui Xue
Qi Chen
Fan Yang
Zhengjin Xiao
LLMAG
ELM
AAML
23
23
0
09 Feb 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in
  Closed-Source LLMs
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Simone Balloccu
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
SILM
ELM
PILM
16
152
0
06 Feb 2024
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM
  Collaboration
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Vidhisha Balachandran
Yulia Tsvetkov
18
22
0
01 Feb 2024
A Linguistic Comparison between Human and ChatGPT-Generated
  Conversations
A Linguistic Comparison between Human and ChatGPT-Generated Conversations
Morgan Sandler
Hyesun Choung
Arun Ross
Prabu David
DeLMO
13
7
0
29 Jan 2024
Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues
Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues
Yuncheng Hua
Lizhen Qu
Gholamreza Haffari
85
4
0
29 Jan 2024
Evaluating Large Language Models for Health-related Queries with
  Presuppositions
Evaluating Large Language Models for Health-related Queries with Presuppositions
Navreet Kaur
Monojit Choudhury
Danish Pruthi
HILM
ELM
14
2
0
14 Dec 2023
Playing Large Games with Oracles and AI Debate
Playing Large Games with Oracles and AI Debate
Xinyi Chen
Angelica Chen
Dean Foster
Elad Hazan
25
3
0
08 Dec 2023
Exploring the Reversal Curse and Other Deductive Logical Reasoning in
  BERT and GPT-Based Large Language Models
Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language Models
Da Wu
Jing Yang
Kai Wang
LRM
8
5
0
06 Dec 2023
Large Language Models Cannot Self-Correct Reasoning Yet
Large Language Models Cannot Self-Correct Reasoning Yet
Jie Huang
Xinyun Chen
Swaroop Mishra
Huaixiu Steven Zheng
Adams Wei Yu
Xinying Song
Denny Zhou
ReLM
LRM
6
415
0
03 Oct 2023
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions
  of Large Language Models with Suggest-Critique-Reflect Process
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process
Anna Glazkova
Zongjie Li
Michael Kadantsev
Maksim Glazkov
KELM
22
14
0
04 May 2023
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Kent K. Chang
Mackenzie Cramer
Sandeep Soni
David Bamman
RALM
138
109
0
28 Apr 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
206
2,232
0
22 Mar 2023
Foundation Models for Decision Making: Problems, Methods, and
  Opportunities
Foundation Models for Decision Making: Problems, Methods, and Opportunities
Sherry Yang
Ofir Nachum
Yilun Du
Jason W. Wei
Pieter Abbeel
Dale Schuurmans
LM&Ro
OffRL
LRM
AI4CE
90
148
0
07 Mar 2023
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of
  Chain-of-Thought
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
116
270
0
03 Oct 2022
RobustLR: Evaluating Robustness to Logical Perturbation in Deductive
  Reasoning
RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning
Soumya Sanyal
Zeyi Liao
Xiang Ren
ELM
ReLM
LRM
56
19
0
25 May 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
12
Next