Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.03045
Cited By
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets
3 July 2024
Zhihua Jin
Shiyi Liu
Haotian Li
Xun Zhao
Huamin Qu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets"
4 / 4 papers shown
Title
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
Minsuk Kahng
Ian Tenney
Mahima Pushkarna
Michael Xieyang Liu
James Wexler
Emily Reif
Krystal Kallarackal
Minsuk Chang
Michael Terry
Lucas Dixon
46
21
0
16 Feb 2024
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
500
0
28 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
1