Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.07587
Cited By
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
8 November 2023
C. D. Freeman
Laura J. Culp
Aaron T Parisi
Maxwell Bileschi
Gamaleldin F. Elsayed
Alex Rizkowsky
Isabelle Simpson
A. Alemi
Azade Nova
Ben Adlam
Bernd Bohnet
Gaurav Mishra
Hanie Sedghi
Igor Mordatch
Izzeddin Gur
Jaehoon Lee
JD Co-Reyes
Jeffrey Pennington
Kelvin Xu
Kevin Swersky
Kshiteej Mahajan
Lechao Xiao
Rosanne Liu
Simon Kornblith
Noah Constant
Peter J. Liu
Roman Novak
Yundi Qian
Noah Fiedel
Jascha Narain Sohl-Dickstein
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?"
2 / 2 papers shown
Title
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
209
178
0
20 Oct 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
1