ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.18104
  4. Cited By
Making Them Ask and Answer: Jailbreaking Large Language Models in Few
  Queries via Disguise and Reconstruction

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction

28 February 2024
Tong Liu
Yingjie Zhang
Zhe Zhao
Yinpeng Dong
Guozhu Meng
Kai Chen
    AAML
ArXivPDFHTML

Papers citing "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction"

6 / 6 papers shown
Title
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models
Tri Nguyen
Lohith Srikanth Pentapalli
Magnus Sieverding
Laurah Turner
Seth Overla
...
Michael Gharib
Matt Kelleher
Michael Shukis
Cameron Pawlik
Kelly Cohen
48
0
0
21 Apr 2025
Diversity Helps Jailbreak Large Language Models
Diversity Helps Jailbreak Large Language Models
Weiliang Zhao
Daniel Ben-Levi
Wei Hao
Junfeng Yang
Chengzhi Mao
AAML
45
0
0
06 Nov 2024
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning
Yuqiang Sun
Daoyuan Wu
Yue Xue
Han Liu
Wei Ma
Lyuye Zhang
Miaolei Shi
Yingjiu Li
ELM
76
46
0
29 Jan 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
273
1,561
0
18 Sep 2019
1