ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.09326
  4. Cited By
Characterizing and Evaluating the Reliability of LLMs against Jailbreak
  Attacks

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

18 August 2024
Kexin Chen
Yi Liu
Dongxia Wang
Jiaying Chen
Wenhai Wang
ArXivPDFHTML

Papers citing "Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks"

3 / 3 papers shown
Title
Will releasing the weights of future large language models grant
  widespread access to pandemic agents?
Will releasing the weights of future large language models grant widespread access to pandemic agents?
Anjali Gopal
Nathan Helm-Burger
Lenni Justen
Emily H. Soice
Tiffany Tzeng
Geetha Jeyapragasan
Simon Grimm
Benjamin Mueller
K. Esvelt
34
16
0
25 Oct 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1