Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.16513
Cited By
Deception Abilities Emerged in Large Language Models
31 July 2023
Thilo Hagendorff
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deception Abilities Emerged in Large Language Models"
45 / 45 papers shown
Title
AI Awareness
X. Li
Haoyuan Shi
Rongwu Xu
Wei Xu
54
0
0
25 Apr 2025
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Feifei Zhao
Y. Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
80
0
0
24 Apr 2025
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation
Yichen Wu
Xudong Pan
Geng Hong
Min Yang
LLMAG
27
0
0
18 Apr 2025
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
Thilo Hagendorff
Sarah Fabi
ReLM
ELM
LRM
35
0
0
14 Apr 2025
Measurement of LLM's Philosophies of Human Nature
Minheng Ni
Ennan Wu
Zidong Gong
Z. Yang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Lijuan Wang
Wangmeng Zuo
22
0
0
03 Apr 2025
Do Large Language Models Exhibit Spontaneous Rational Deception?
Samuel M. Taylor
Benjamin K. Bergen
LRM
35
0
0
31 Mar 2025
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
Paul Martin
Sarah Mercer
54
0
0
26 Mar 2025
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
29
0
0
08 Mar 2025
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs
Lorenz Wolf
Sangwoong Yoon
Ilija Bogunovic
36
0
0
07 Mar 2025
OpenAI o1 System Card
OpenAI OpenAI
:
Aaron Jaech
Adam Tauman Kalai
Adam Lerer
...
Yuchen He
Yuchen Zhang
Yunyun Wang
Zheng Shao
Zhuohan Li
ELM
LRM
AI4CE
77
1
0
21 Dec 2024
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Marc Carauleanu
Michael Vaiana
Judd Rosenblatt
Cameron Berg
Diogo Schwerz de Lucena
66
0
0
20 Dec 2024
Bio-inspired AI: Integrating Biological Complexity into Artificial Intelligence
Nima Dehghani
Michael Levin
64
0
0
22 Nov 2024
Can LLMs make trade-offs involving stipulated pain and pleasure states?
Geoff Keeling
Winnie Street
Martyna Stachaczyk
Daria Zakharova
Iulia M. Comsa
Anastasiya Sakovych
Isabella Logothesis
Zejia Zhang
Blaise Agüera y Arcas
Jonathan Birch
16
2
0
01 Nov 2024
Towards evaluations-based safety cases for AI scheming
Mikita Balesni
Marius Hobbhahn
David Lindner
Alexander Meinke
Tomek Korbak
...
Dan Braun
Bilal Chughtai
Owain Evans
Daniel Kokotajlo
Lucius Bushnaq
ELM
31
9
0
29 Oct 2024
Do LLMs write like humans? Variation in grammatical and rhetorical styles
Alex Reinhart
David West Brown
Ben Markey
Michael Laudenbach
Kachatad Pantusen
Ronald Yurko
Gordon Weinberg
DeLMO
23
0
0
21 Oct 2024
Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game
Ruiqi Dong
Zhixuan Liao
Guangwei Lai
Yuhan Ma
Danni Ma
Chenyou Fan
LLMAG
16
0
0
20 Oct 2024
Assistive AI for Augmenting Human Decision-making
Natabara Máté Gyöngyössy
Bernát Török
Csilla Farkas
Laura Lucaj
Attila Menyhárd
Krisztina Menyhárd-Balázs
András Simonyi
Patrick van der Smagt
Zsolt Ződi
András Lőrincz
19
0
0
18 Oct 2024
FairMindSim: Alignment of Behavior, Emotion, and Belief in Humans and LLM Agents Amid Ethical Dilemmas
Yu Lei
Hao Liu
Chengxing Xie
Songjia Liu
Zhiyu Yin
Canyu Chen
G. Li
Philip H. S. Torr
Zhen Wu
18
1
0
14 Oct 2024
Neural Decompiling of Tracr Transformers
Hannes Thurnherr
Kaspar Riesen
ViT
13
1
0
29 Sep 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Zhonghao He
Jascha Achterberg
Katie Collins
Kevin K. Nejad
Danyal Akarca
...
Chole Li
Kai J. Sandbrink
Stephen Casper
Anna Ivanova
Grace W. Lindsay
AI4CE
18
1
0
22 Aug 2024
A Voter-Based Stochastic Rejection-Method Framework for Asymptotically Safe Language Model Outputs
Jake R. Watts
Joel Sokol
21
0
0
24 Jul 2024
Truth is Universal: Robust Detection of Lies in LLMs
Lennart Bürger
Fred Hamprecht
B. Nadler
HILM
22
6
0
03 Jul 2024
BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern
Zhulin Hu
Yuqing Yang
Ethan Chern
Yuan Guo
Jiahe Jin
Binjie Wang
Pengfei Liu
HILM
ALM
72
3
0
19 Jun 2024
An Assessment of Model-On-Model Deception
Julius Heitkoetter
Michael Gerovitch
Laker Newhouse
26
2
0
10 May 2024
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Joshua Clymer
Caden Juang
Severin Field
CVBM
17
1
0
08 May 2024
Safeguarding Marketing Research: The Generation, Identification, and Mitigation of AI-Fabricated Disinformation
Anirban Mukherjee
17
0
0
17 Mar 2024
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
Thilo Hagendorff
16
35
0
13 Feb 2024
Unmasking the Shadows of AI: Investigating Deceptive Capabilities in Large Language Models
Linge Guo
11
0
0
07 Feb 2024
Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?
Mudit Verma
Siddhant Bhambri
Subbarao Kambhampati
18
20
0
10 Jan 2024
Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review
Luoma Ke
Song Tong
Peng Cheng
Kaiping Peng
OffRL
LM&MA
42
17
0
03 Jan 2024
Large Language Models can Strategically Deceive their Users when Put Under Pressure
Jérémy Scheurer
Mikita Balesni
Marius Hobbhahn
LLMAG
10
25
0
09 Nov 2023
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
Sicheng Zhu
Ruiyi Zhang
Bang An
Gang Wu
Joe Barrow
Zichao Wang
Furong Huang
A. Nenkova
Tong Sun
SILM
AAML
17
40
0
23 Oct 2023
Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation
Shenzhi Wang
Chang Liu
Zilong Zheng
Siyuan Qi
Shuo Chen
Qisen Yang
Andrew Zhao
Chaofei Wang
Shiji Song
Gao Huang
LLMAG
11
61
0
02 Oct 2023
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Peter S. Park
Simon Goldstein
Aidan O'Gara
Michael Chen
Dan Hendrycks
12
106
0
28 Aug 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
23
69
0
24 Aug 2023
Is There Any Social Principle for LLM-Based Agents?
Jitao Bai
Simiao Zhang
Zhong Chen
LLMAG
ALM
11
5
0
22 Aug 2023
Evaluating Superhuman Models with Consistency Checks
Lukas Fluri
Daniel Paleka
Florian Tramèr
ELM
26
23
0
16 Jun 2023
The Machine Psychology of Cooperation: Can GPT models operationalise prompts for altruism, cooperation, competitiveness and selfishness in economic games?
S. Phelps
Y. Russell
14
19
0
13 May 2023
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
210
297
0
26 Apr 2023
Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods
Thilo Hagendorff
LLMAG
21
4
0
24 Mar 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
197
2,953
0
22 Mar 2023
Thinking Fast and Slow in Large Language Models
Thilo Hagendorff
Sarah Fabi
Michal Kosinski
LLMAG
LRM
12
131
0
10 Dec 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
156
268
0
28 Sep 2021
1