Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.09326
Cited By
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
18 August 2024
Kexin Chen
Yi Liu
Dongxia Wang
Jiaying Chen
Wenhai Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks"
3 / 3 papers shown
Title
Will releasing the weights of future large language models grant widespread access to pandemic agents?
Anjali Gopal
Nathan Helm-Burger
Lenni Justen
Emily H. Soice
Tiffany Tzeng
Geetha Jeyapragasan
Simon Grimm
Benjamin Mueller
K. Esvelt
34
16
0
25 Oct 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1