ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07590
  4. Cited By
Large Language Models can Strategically Deceive their Users when Put
  Under Pressure

Large Language Models can Strategically Deceive their Users when Put Under Pressure

9 November 2023
Jérémy Scheurer
Mikita Balesni
Marius Hobbhahn
    LLMAG
ArXivPDFHTML

Papers citing "Large Language Models can Strategically Deceive their Users when Put Under Pressure"

6 / 6 papers shown
Title
AI Awareness
AI Awareness
X. Li
Haoyuan Shi
Rongwu Xu
Wei Xu
54
0
0
25 Apr 2025
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
Paul Martin
Sarah Mercer
66
0
0
26 Mar 2025
Episodic memory in AI agents poses risks that should be studied and mitigated
Episodic memory in AI agents poses risks that should be studied and mitigated
Chad DeChant
55
1
0
20 Jan 2025
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su
Xuhui Zhou
Sanketh Rangreji
Anubha Kabra
Julia Mendelsohn
Faeze Brahman
Maarten Sap
LLMAG
92
2
0
13 Sep 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
75
0
25 Jan 2024
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1