Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2312.11671
Cited By
v1
v2 (latest)
Evaluating Language-Model Agents on Realistic Autonomous Tasks
18 December 2023
Megan Kinniment
Lucas Jun Koba Sato
Haoxing Du
Brian Goodrich
Max Hasin
Lawrence Chan
Luke Harold Miles
Tao Lin
H. Wijk
Joel Burget
Aaron Ho
Elizabeth Barnes
Paul Christiano
ELM
LLMAG
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluating Language-Model Agents on Realistic Autonomous Tasks"
50 / 67 papers shown
Title
Catching Contamination Before Generation: Spectral Kill Switches for Agents
Valentin Noël
48
0
0
08 Nov 2025
SOCK: A Benchmark for Measuring Self-Replication in Large Language Models
Justin Chavarria
Rohan Raizada
Justin White
Eyad Alhetairshi
ALM
108
0
0
30 Sep 2025
Regulating the Agency of LLM-based Agents
Seán Boddy
Joshua Joseph
ELM
117
0
0
25 Sep 2025
Reliable Weak-to-Strong Monitoring of LLM Agents
Neil Kale
Chen Bo Calvin Zhang
Kevin Zhu
Ankit Aich
Paula Rodriguez
Scale Red Team
Christina Q. Knight
Zifan Wang
144
1
0
26 Aug 2025
LM Agents May Fail to Act on Their Own Risk Knowledge
Yuzhi Tang
Tianxiao Li
Elizabeth Li
Chris J. Maddison
Honghua Dong
Yangjun Ruan
LLMAG
ELM
1.6K
0
0
19 Aug 2025
Establishing Best Practices for Building Rigorous Agentic Benchmarks
Yuxuan Zhu
Tengjun Jin
Yada Pruksachatkun
Andy K. Zhang
Shu Liu
...
Sarah Schwettmann
Matei A. Zaharia
Ion Stoica
Percy Liang
Daniel Kang
497
8
0
03 Jul 2025
SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents
Jonathan Kutasov
Yuqi Sun
Paul Colognese
Teun van der Weij
Linda Petrini
...
Xiang Deng
Henry Sleight
Tyler Tracy
Buck Shlegeris
Joe Benton
LLMAG
251
11
0
17 Jun 2025
AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents
Akshat Naik
Patrick Quinn
Guillermo Bosch
Emma Gouné
Francisco Javier Campos Zabala
Jason Ross Brown
Edward James Young
230
6
0
04 Jun 2025
AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models
Jinchuan Zhang
Lu Yin
Yan Zhou
Songlin Hu
LLMAG
LM&Ro
181
3
0
29 May 2025
Discovering Forbidden Topics in Language Models
Can Rager
Chris Wendler
Rohit Gandikota
David Bau
280
4
0
23 May 2025
Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
Ross Nordby
123
0
0
20 May 2025
What Is AI Safety? What Do We Want It to Be?
Philosophical Studies (Philos. Stud.), 2025
Jacqueline Harding
Cameron Domenico Kirk-Giannini
273
0
0
05 May 2025
RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents
Sid Black
Asa Cooper Stickland
Jake Pencharz
Oliver Sourbut
Michael Schmatz
Jay Bailey
Ollie Matthews
Ben Millwood
Alex Remedios
Alan Cooney
ELM
942
6
0
21 Apr 2025
On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows
Souradip Chakraborty
Mohammadreza Pourreza
Ruoxi Sun
Yiwen Song
Nino Scherrer
...
Furong Huang
Amrit Singh Bedi
Ahmad Beirami
Hamid Palangi
Tomas Pfister
448
2
0
02 Apr 2025
Large language model-powered AI systems achieve self-replication with no human intervention
Xudong Pan
Jiarun Dai
Yihe Fan
Minyuan Luo
Changyi Li
Min Yang
GNN
LRM
167
3
0
14 Mar 2025
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs
Lorenz Wolf
Sangwoong Yoon
Ilija Bogunovic
195
0
0
07 Mar 2025
Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
International Joint Conference on Artificial Intelligence (IJCAI), 2024
John Burden
Marko Tesic
Lorenzo Pacchiardi
José Hernández-Orallo
232
7
0
21 Feb 2025
Measuring AI agent autonomy: Towards a scalable approach with code inspection
Peter Cihon
Merlin Stein
Gagan Bansal
Sam Manning
Kevin Xu
162
10
0
21 Feb 2025
Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language Models
Cameron R. Jones
Benjamin Bergen
371
12
0
22 Dec 2024
Frontier AI systems have surpassed the self-replicating red line
Xudong Pan
Jiarun Dai
Yihe Fan
Min Yang
GNN
178
15
0
09 Dec 2024
Probing the Capacity of Language Model Agents to Operationalize Disparate Experiential Context Despite Distraction
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Sonny George
Chris Sypherd
Dylan Cashman
LLMAG
219
1
0
19 Nov 2024
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Blake Bullwinkel
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
339
38
0
18 Nov 2024
Safety case template for frontier AI: A cyber inability argument
Arthur Goemans
Marie Davidsen Buhl
Jonas Schuett
Tomek Korbak
Jessica Wang
Benjamin Hilton
Geoffrey Irving
228
24
0
12 Nov 2024
Can LLMs make trade-offs involving stipulated pain and pleasure states?
Geoff Keeling
Winnie Street
Martyna Stachaczyk
Daria Zakharova
Iulia M. Comsa
Anastasiya Sakovych
Isabella Logothesis
Zejia Zhang
Blaise Agüera y Arcas
Jonathan Birch
202
11
0
01 Nov 2024
Towards evaluations-based safety cases for AI scheming
Mikita Balesni
Marius Hobbhahn
David Lindner
Alexander Meinke
Tomek Korbak
...
Dan Braun
Bilal Chughtai
Owain Evans
Daniel Kokotajlo
Lucius Bushnaq
ELM
237
21
0
29 Oct 2024
Standardization Trends on Safety and Trustworthiness Technology for Advanced AI
Jonghong Jeon
158
4
0
29 Oct 2024
MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
Juyong Lee
Dongyoon Hahm
June Suk Choi
W. Bradley Knox
Kimin Lee
LLMAG
ELM
AAML
LM&Ro
172
20
0
23 Oct 2024
Assessing the Performance of Human-Capable LLMs -- Are LLMs Coming for Your Job?
John Mavi
Nathan Summers
Sergio Coronado
LLMAG
77
0
0
05 Oct 2024
Beyond Prompts: Dynamic Conversational Benchmarking of Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
David Castillo-Bolado
Joseph Davidson
Finlay Gray
Marek Rosa
202
15
0
30 Sep 2024
Analyzing Probabilistic Methods for Evaluating Agent Capabilities
Axel Højmark
Govind Pimpale
Arjun Panickssery
Marius Hobbhahn
Jérémy Scheurer
253
4
0
24 Sep 2024
Conversational Complexity for Assessing Risk in Large Language Models
John Burden
Manuel Cebrian
José Hernández-Orallo
283
5
0
02 Sep 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLM
MoE
OSLM
525
1,474
0
31 Jul 2024
Scaling Trends in Language Model Robustness
Nikolhaus Howe
Michal Zajac
I. R. McKenzie
Oskar Hollinsworth
Tom Tseng
Aaron David Tucker
Pierre-Luc Bacon
Adam Gleave
586
1
0
25 Jul 2024
Evaluating AI Evaluation: Perils and Prospects
John Burden
ELM
184
13
0
12 Jul 2024
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Neural Information Processing Systems (NeurIPS), 2024
Edoardo Debenedetti
Jie Zhang
Mislav Balunović
Luca Beurer-Kellner
Marc Fischer
Florian Tramèr
LLMAG
AAML
332
77
1
19 Jun 2024
IDs for AI Systems
Alan Chan
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Nitarshan Rajkumar
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
297
11
0
17 Jun 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
471
56
0
11 Jun 2024
CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Ling Shi
Deyi Xiong
ELM
204
2
0
07 Jun 2024
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Anthony Costarelli
Mat Allen
Roman Hauksson
Grace Sodunke
Suhas Hariharan
Carlson Cheng
Wenjie Li
Joshua Clymer
Arjun Yadav
ELM
ReLM
LLMAG
LRM
248
41
0
07 Jun 2024
Harvard Undergraduate Survey on Generative AI
Shikoh Hirabayashi
Rishab Jain
Nikola Jurković
Gabriel Wu
AI4CE
81
7
0
02 Jun 2024
Stress-Testing Capability Elicitation With Password-Locked Models
Ryan Greenblatt
Fabien Roger
Dmitrii Krasheninnikov
David M. Krueger
278
25
0
29 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
International Conference on Learning Representations (ICLR), 2024
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
434
163
0
23 May 2024
Securing the Future of GenAI: Policy and Technology
Mihai Christodorescu
Craven
Soheil Feizi
Neil Zhenqiang Gong
Mia Hoffmann
...
Jessica Newman
Emelia Probasco
Yanjun Qi
Khawaja Shams
Turek
SILM
232
12
0
21 May 2024
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
Giorgio Piatti
Zhijing Jin
Max Kleiman-Weiner
Bernhard Schölkopf
Mrinmaya Sachan
Amélie Reymond
LLMAG
331
52
0
25 Apr 2024
Exploring Autonomous Agents through the Lens of Large Language Models: A Review
Saikat Barua
LM&MA
LLMAG
203
33
0
05 Apr 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Chuang Liu
Linhao Yu
Jiaxuan Li
Renren Jin
Yufei Huang
...
Tao Liu
Jinwang Song
Hongying Zan
Sun Li
Deyi Xiong
ELM
275
13
0
18 Mar 2024
The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley
137
17
0
07 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
614
291
0
05 Mar 2024
Determinants of LLM-assisted Decision-Making
Eva Eigner
Thorsten Händler
216
77
0
27 Feb 2024
AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy
P. Schoenegger
Peter S. Park
Ezra Karger
P. Tetlock
240
24
0
12 Feb 2024
1
2
Next