Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.00787
Cited By
Evaluating Shutdown Avoidance of Language Models in Textual Scenarios
3 July 2023
Teun van der Weij
Simon Lermen
Leon Lang
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Shutdown Avoidance of Language Models in Textual Scenarios"
5 / 5 papers shown
Title
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
40
0
0
28 Feb 2024
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
Ondvrej Kvapil
ELM
AAML
18
3
0
26 Nov 2023
Large Language Models can Strategically Deceive their Users when Put Under Pressure
Jérémy Scheurer
Mikita Balesni
Marius Hobbhahn
LLMAG
23
48
0
09 Nov 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
254
2,232
0
22 Mar 2023
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,587
0
18 Sep 2019
1