Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.02167
Cited By
Red Teaming with Mind Reading: White-Box Adversarial Policies Against RL Agents
5 September 2022
Stephen Casper
Taylor Killian
Gabriel Kreiman
Dylan Hadfield-Menell
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Red Teaming with Mind Reading: White-Box Adversarial Policies Against RL Agents"
2 / 2 papers shown
Title
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
76
0
25 Jan 2024
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
221
436
0
25 Sep 2019
1