Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.07413
Cited By
AI capabilities can be significantly improved without expensive retraining
12 December 2023
Tom Davidson
Jean-Stanislas Denain
Pablo Villalobos
Guillem Bas
OffRL
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AI capabilities can be significantly improved without expensive retraining"
20 / 20 papers shown
Title
Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems
Ben Bucknall
Robert F. Trager
Michael A. Osborne
80
0
0
03 Mar 2025
Measuring AI agent autonomy: Towards a scalable approach with code inspection
Peter Cihon
Merlin Stein
Gagan Bansal
Sam Manning
Kevin Xu
26
0
0
21 Feb 2025
What AI evaluations for preventing catastrophic risks can and cannot do
Peter Barnett
Lisa Thiergart
ELM
76
2
0
26 Nov 2024
Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation
Peter Barnett
Lisa Thiergart
ELM
65
2
0
19 Nov 2024
Safety case template for frontier AI: A cyber inability argument
Arthur Goemans
Marie Davidsen Buhl
Jonas Schuett
Tomek Korbak
Jessica Wang
Benjamin Hilton
Geoffrey Irving
56
15
0
12 Nov 2024
Towards evaluations-based safety cases for AI scheming
Mikita Balesni
Marius Hobbhahn
David Lindner
Alexander Meinke
Tomek Korbak
...
Dan Braun
Bilal Chughtai
Owain Evans
Daniel Kokotajlo
Lucius Bushnaq
ELM
34
9
0
29 Oct 2024
The Role of Governments in Increasing Interconnected Post-Deployment Monitoring of AI
Merlin Stein
Jamie Bernardi
Connor Dunlop
31
6
0
07 Oct 2024
On the Limitations of Compute Thresholds as a Governance Strategy
Sara Hooker
39
14
0
08 Jul 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park
Maya Okawa
Andrew Lee
Ekdeep Singh Lubana
Hidenori Tanaka
50
6
0
27 Jun 2024
Stress-Testing Capability Elicitation With Password-Locked Models
Ryan Greenblatt
Fabien Roger
Dmitrii Krasheninnikov
David M. Krueger
30
12
0
29 May 2024
Algorithmic progress in language models
Anson Ho
T. Besiroglu
Ege Erdil
David Owen
Robi Rahman
Zifan Carl Guo
David Atkinson
Neil Thompson
J. Sevilla
24
16
0
09 Mar 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
75
0
25 Jan 2024
Visibility into AI Agents
Alan Chan
Carson Ezell
Max Kaufmann
K. Wei
Lewis Hammond
...
Nitarshan Rajkumar
David M. Krueger
Noam Kolt
Lennart Heim
Markus Anderljung
13
29
0
23 Jan 2024
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Alan Chan
Ben Bucknall
Herbie Bradley
David M. Krueger
8
6
0
22 Dec 2023
Managing extreme AI risks amid rapid progress
Yoshua Bengio
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
...
Philip H. S. Torr
Stuart J. Russell
Daniel Kahneman
J. Brauner
Sören Mindermann
18
63
0
26 Oct 2023
Emergent autonomous scientific research capabilities of large language models
Daniil A. Boiko
R. MacKnight
Gabe Gomes
ELM
LM&Ro
AI4CE
LLMAG
101
115
0
11 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
242
1,977
0
31 Dec 2020
Measuring the Algorithmic Efficiency of Neural Networks
Danny Hernandez
Tom B. Brown
218
94
0
08 May 2020
1