Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.14125
Cited By
Scalable AI Safety via Doubly-Efficient Debate
23 November 2023
Jonah Brown-Cohen
Geoffrey Irving
Georgios Piliouras
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scalable AI Safety via Doubly-Efficient Debate"
15 / 15 papers shown
Title
An alignment safety case sketch based on debate
Marie Davidsen Buhl
Jacob Pfau
Benjamin Hilton
Geoffrey Irving
30
0
0
06 May 2025
Scaling Laws For Scalable Oversight
Joshua Engels
David D. Baek
Subhash Kantamneni
Max Tegmark
ELM
70
0
0
25 Apr 2025
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Gabriel Recchia
Chatrik Singh Mangat
Issac Li
Gayatri Krishnakumar
ALM
77
0
0
29 Mar 2025
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
36
0
0
08 Mar 2025
Neural Interactive Proofs
Lewis Hammond
Sam Adam-Day
AAML
84
2
0
12 Dec 2024
Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Samuel Arnesen
David Rein
Julian Michael
ELM
28
3
0
25 Sep 2024
How Susceptible are LLMs to Influence in Prompts?
Sotiris Anagnostidis
Jannis Bulian
LRM
25
16
0
17 Aug 2024
The Oscars of AI Theater: A Survey on Role-Playing with Language Models
Nuo Chen
Yan Wang
Yang Deng
Jia Li
26
14
0
16 Jul 2024
On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton
Noah Y. Siegel
János Kramár
Jonah Brown-Cohen
Samuel Albanie
...
Rishabh Agarwal
David Lindner
Yunhao Tang
Noah D. Goodman
Rohin Shah
ELM
35
28
0
05 Jul 2024
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
34
2
0
24 May 2024
Playing Large Games with Oracles and AI Debate
Xinyi Chen
Angelica Chen
Dean Foster
Elad Hazan
25
3
0
08 Dec 2023
Physics simulation capabilities of LLMs
M. Ali-Dib
Kristen Menou
ELM
AI4CE
24
0
0
04 Dec 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,413
0
06 Oct 2022
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
235
255
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
1