Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.07510
Cited By
Secret Collusion among Generative AI Agents: Multi-Agent Deception via Steganography
12 February 2024
S. Motwani
Mikhail Baranchuk
Martin Strohmeier
Vijay Bolina
Philip H. S. Torr
Lewis Hammond
Christian Schroeder de Witt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Secret Collusion among Generative AI Agents: Multi-Agent Deception via Steganography"
7 / 7 papers shown
Title
The Steganographic Potentials of Language Models
Artem Karpov
Tinuade Adeleke
Seong Hah Cho
Natalia Perez-Campanero
20
0
0
06 May 2025
Exploiting Fine-Grained Skip Behaviors for Micro-Video Recommendation
Sanghyuck Lee
Sangkeun Park
Jaesung Lee
48
0
0
04 Apr 2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
82
1
0
22 Jan 2025
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Andis Draguns
Andrew Gritsevskiy
S. Motwani
Charlie Rogers-Smith
Jeffrey Ladish
Christian Schroeder de Witt
40
2
0
03 Jun 2024
Preventing Language Models From Hiding Their Reasoning
Fabien Roger
Ryan Greenblatt
LRM
18
16
0
27 Oct 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
215
1,701
0
07 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
1