ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.16513
  4. Cited By
Deception Abilities Emerged in Large Language Models
v1v2 (latest)

Deception Abilities Emerged in Large Language Models

Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2023
31 July 2023
Thilo Hagendorff
    LLMAG
ArXiv (abs)PDFHTMLGithub

Papers citing "Deception Abilities Emerged in Large Language Models"

50 / 61 papers shown
Are Your Agents Upward Deceivers?
Are Your Agents Upward Deceivers?
Dadi Guo
Qingyu Liu
Dongrui Liu
Qihan Ren
Shuai Shao
...
Z. Chen
Jialing Tao
Yaodong Yang
Jing Shao
Xia Hu
LLMAG
224
3
0
04 Dec 2025
Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models
Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models
Sitong Fang
Shiyi Hou
Kaile Wang
Boyuan Chen
Donghai Hong
Jiayi Zhou
Josef Dai
Yaodong Yang
Jiaming Ji
AAML
248
0
0
29 Nov 2025
Estimating the Error of Large Language Models at Pairwise Text Comparison
Estimating the Error of Large Language Models at Pairwise Text Comparison
Tianyi Li
132
0
0
25 Oct 2025
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
Yao Huang
Yitong Sun
Yichi Zhang
Ruochen Zhang
Yinpeng Dong
Xingxing Wei
222
6
0
17 Oct 2025
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL
Marwa Abdulhai
Ryan Cheng
Aryansh Shrivastava
Natasha Jaques
Y. Gal
Sergey Levine
127
2
0
16 Oct 2025
Scheming Ability in LLM-to-LLM Strategic Interactions
Scheming Ability in LLM-to-LLM Strategic Interactions
Thao Pham
LLMAGLRM
168
4
0
11 Oct 2025
VelLMes: A high-interaction AI-based deception framework
VelLMes: A high-interaction AI-based deception framework
Muris Sladić
Veronica Valeros
C. Catania
Sebastian Garcia
183
1
0
08 Oct 2025
Know Thyself? On the Incapability and Implications of AI Self-Recognition
Know Thyself? On the Incapability and Implications of AI Self-Recognition
Xiaoyan Bai
Aryan Shrivastava
Ari Holtzman
Chenhao Tan
SSL
327
1
0
03 Oct 2025
A Single Character can Make or Break Your LLM Evals
A Single Character can Make or Break Your LLM Evals
Jingtong Su
Jianyu Zhang
Karen Ullrich
Léon Bottou
Mark Ibrahim
155
0
0
02 Oct 2025
The Secret Agenda: LLMs Strategically Lie and Our Current Safety Tools Are Blind
The Secret Agenda: LLMs Strategically Lie and Our Current Safety Tools Are Blind
Caleb DeLeeuw
Gaurav Chawla
Aniket Sharma
Vanessa Dietze
137
2
0
23 Sep 2025
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models
Psychometric Personality Shaping Modulates Capabilities and Safety in Language Models
Stephen Fitz
P. Romero
Steven Basart
Sipeng Chen
Jose Hernandez-Orallo
178
1
0
19 Sep 2025
Caught in the Act: a mechanistic approach to detecting deception
Caught in the Act: a mechanistic approach to detecting deception
Gerard Boxo
Ryan Socha
Daniel Yoo
Shivam Raval
158
2
0
27 Aug 2025
A Multi-Task Evaluation of LLMs' Processing of Academic Text Input
A Multi-Task Evaluation of LLMs' Processing of Academic Text Input
Tianyi Li
Yu Qin
Olivia R. Liu Sheng
183
3
0
15 Aug 2025
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Zhaomin Wu
Mingzhe Du
See-Kiong Ng
Bingsheng He
253
6
0
08 Aug 2025
Against racing to AGI: Cooperation, deterrence, and catastrophic risks
Against racing to AGI: Cooperation, deterrence, and catastrophic risks
Leonard Dung
Max Hellrigel-Holderbaum
AI4CE
294
1
0
29 Jul 2025
PRISON: Unmasking the Criminal Potential of Large Language Models
PRISON: Unmasking the Criminal Potential of Large Language Models
Xinyi Wu
Geng Hong
Pei Chen
Yueyue Chen
Xudong Pan
Min Yang
339
3
0
19 Jun 2025
Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior
Large Language Models are Near-Optimal Decision-Makers with a Non-Human Learning Behavior
Hao Li
Gengrui Zhang
Petter Holme
Shuyue Hu
Zhen Wang
248
1
0
19 Jun 2025
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors
Neither Stochastic Parroting nor AGI: LLMs Solve Tasks through Context-Directed Extrapolation from Training Data Priors
Harish Tayyar Madabushi
Melissa Torgbi
C. Bonial
467
4
0
29 May 2025
Mitigating Deceptive Alignment via Self-Monitoring
Mitigating Deceptive Alignment via Self-Monitoring
Jiaming Ji
Wenqi Chen
Kaile Wang
Donghai Hong
Sitong Fang
...
Jiayi Zhou
Juntao Dai
Sirui Han
Wenhan Luo
Yaodong Yang
LRM
353
19
0
24 May 2025
Exploring the generalization of LLM truth directions on conversational formats
Exploring the generalization of LLM truth directions on conversational formats
Timour Ichmoukhamedov
David Martens
276
1
0
14 May 2025
AI Awareness
AI Awareness
Xianrui Li
Haoyuan Shi
Rongwu Xu
Wei Xu
534
4
0
25 Apr 2025
Super Co-alignment of Human and AI for Sustainable Symbiotic Society
Super Co-alignment of Human and AI for Sustainable Symbiotic Society
Yi Zeng
Yijiao Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
699
1
0
24 Apr 2025
OpenDeception: Learning Deception and Trust in Human-AI Interaction via Multi-Agent Simulation
OpenDeception: Learning Deception and Trust in Human-AI Interaction via Multi-Agent Simulation
Yichen Wu
Xudong Pan
Geng Hong
Min Yang
Min Yang
LLMAG
383
14
0
18 Apr 2025
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
Thilo Hagendorff
Sarah Fabi
ReLMELMLRM
280
1
0
14 Apr 2025
Measurement of LLM's Philosophies of Human Nature
Measurement of LLM's Philosophies of Human Nature
Minheng Ni
Ennan Wu
Zidong Gong
Zhiyong Yang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Lijuan Wang
Wangmeng Zuo
425
0
0
03 Apr 2025
Do Large Language Models Exhibit Spontaneous Rational Deception?
Do Large Language Models Exhibit Spontaneous Rational Deception?
Samuel M. Taylor
Benjamin K. Bergen
LRM
456
4
0
31 Mar 2025
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
Paul Martin
Sarah Mercer
1.2K
1
0
26 Mar 2025
Research Superalignment Should Advance Now with Alternating Competence and Conformity Optimization
Research Superalignment Should Advance Now with Alternating Competence and Conformity Optimization
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
421
0
0
08 Mar 2025
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs
Lorenz Wolf
Sangwoong Yoon
Ilija Bogunovic
256
2
0
07 Mar 2025
OpenAI o1 System Card
OpenAI o1 System Card
OpenAI OpenAI
:
Aaron Jaech
Adam Tauman Kalai
Adam Lerer
...
Yuchen He
Yuchen Zhang
Yunyun Wang
Zheng Shao
Zhuohan Li
ELMLRMAI4CE
448
0
0
21 Dec 2024
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Marc Carauleanu
Michael Vaiana
Judd Rosenblatt
Cameron Berg
Diogo Schwerz de Lucena
296
5
0
20 Dec 2024
Bio-inspired AI: Integrating Biological Complexity into Artificial
  Intelligence
Bio-inspired AI: Integrating Biological Complexity into Artificial Intelligence
Nima Dehghani
Michael Levin
294
7
0
22 Nov 2024
Can LLMs make trade-offs involving stipulated pain and pleasure states?
Can LLMs make trade-offs involving stipulated pain and pleasure states?
Geoff Keeling
Winnie Street
Martyna Stachaczyk
Daria Zakharova
Iulia M. Comsa
Anastasiya Sakovych
Isabella Logothesis
Zejia Zhang
Blaise Agüera y Arcas
Jonathan Birch
320
12
0
01 Nov 2024
Towards evaluations-based safety cases for AI scheming
Towards evaluations-based safety cases for AI scheming
Mikita Balesni
Marius Hobbhahn
David Lindner
Alexander Meinke
Tomek Korbak
...
Dan Braun
Bilal Chughtai
Owain Evans
Daniel Kokotajlo
Lucius Bushnaq
ELM
328
31
0
29 Oct 2024
An Auditing Test To Detect Behavioral Shift in Language Models
An Auditing Test To Detect Behavioral Shift in Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Leo Richter
Xuanli He
Pasquale Minervini
Matt J. Kusner
505
0
0
25 Oct 2024
Do LLMs write like humans? Variation in grammatical and rhetorical styles
Do LLMs write like humans? Variation in grammatical and rhetorical stylesProceedings of the National Academy of Sciences of the United States of America (PNAS), 2024
Alex Reinhart
David West Brown
Ben Markey
Michael Laudenbach
Kachatad Pantusen
Ronald Yurko
Gordon Weinberg
DeLMO
384
68
0
21 Oct 2024
Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic
  in the Game
Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game
Ruiqi Dong
Zhixuan Liao
Guangwei Lai
Yuhan Ma
Danni Ma
Chenyou Fan
LLMAG
248
3
0
20 Oct 2024
Assistive AI for Augmenting Human Decision-making
Assistive AI for Augmenting Human Decision-making
Natabara Máté Gyöngyössy
Bernát Török
Csilla Farkas
Laura Lucaj
Attila Menyhárd
Krisztina Menyhárd-Balázs
András Simonyi
Patrick van der Smagt
Zsolt Ződi
András Lőrincz
374
1
0
18 Oct 2024
FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and
  LLM Agents Amid Ethical Dilemmas
FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas
Yu Lei
Hao Liu
Chengxing Xie
Songjia Liu
Zhiyu Yin
Canyu Chen
Ge Li
Juil Sock
Zhen Wu
420
21
0
14 Oct 2024
Neural Decompiling of Tracr Transformers
Neural Decompiling of Tracr TransformersIAPR International Workshop on Artificial Neural Networks in Pattern Recognition (ANNPR), 2024
Hannes Thurnherr
Kaspar Riesen
ViT
349
2
0
29 Sep 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging
  Framework And Methods From Neuroscience
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Zhonghao He
Jascha Achterberg
Katie Collins
Kevin K. Nejad
Danyal Akarca
...
Chole Li
Kai J. Sandbrink
Stephen Casper
Anna Ivanova
Grace W. Lindsay
AI4CE
386
8
0
22 Aug 2024
A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
Jake R. Watts
Joel Sokol
272
0
0
24 Jul 2024
Truth is Universal: Robust Detection of Lies in LLMs
Truth is Universal: Robust Detection of Lies in LLMs
Lennart Bürger
Fred Hamprecht
B. Nadler
HILM
293
66
0
03 Jul 2024
The House Always Wins: A Framework for Evaluating Strategic Deception in
  LLMs
The House Always Wins: A Framework for Evaluating Strategic Deception in LLMs
Tanush Chopra
Michael Li
72
1
0
01 Jul 2024
BeHonest: Benchmarking Honesty in Large Language Models
BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern
Zhulin Hu
Yuqing Yang
Ethan Chern
Yuan Guo
Jiahe Jin
Binjie Wang
Pengfei Liu
HILMALM
367
14
0
19 Jun 2024
An Assessment of Model-On-Model Deception
An Assessment of Model-On-Model Deception
Julius Heitkoetter
Michael Gerovitch
Laker Newhouse
208
5
0
10 May 2024
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Joshua Clymer
Caden Juang
Severin Field
CVBM
327
8
0
08 May 2024
Safeguarding Marketing Research: The Generation, Identification, and
  Mitigation of AI-Fabricated Disinformation
Safeguarding Marketing Research: The Generation, Identification, and Mitigation of AI-Fabricated Disinformation
Anirban Mukherjee
199
6
0
17 Mar 2024
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
Thilo Hagendorff
293
100
0
13 Feb 2024
Unmasking the Shadows of AI: Investigating Deceptive Capabilities in
  Large Language Models
Unmasking the Shadows of AI: Investigating Deceptive Capabilities in Large Language Models
Linge Guo
162
3
0
07 Feb 2024
12
Next
Page 1 of 2