ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.16459
  4. Cited By
Defending LLMs against Jailbreaking Attacks via Backtranslation

Defending LLMs against Jailbreaking Attacks via Backtranslation

26 February 2024
Yihan Wang
Zhouxing Shi
Andrew Bai
Cho-Jui Hsieh
    AAML
ArXivPDFHTML

Papers citing "Defending LLMs against Jailbreaking Attacks via Backtranslation"

26 / 26 papers shown
Title
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models
POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models
Yangguang Shao
Xinjie Lin
Haozheng Luo
Chengshang Hou
G. Xiong
J. Yu
Junzheng Shi
SILM
37
0
0
10 May 2025
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Haoming Yang
Ke Ma
X. Jia
Yingfei Sun
Qianqian Xu
Q. Huang
AAML
68
0
0
03 May 2025
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
LLM Security: Vulnerabilities, Attacks, Defenses, and Countermeasures
Francisco Aguilera-Martínez
Fernando Berzal
PILM
48
0
0
02 May 2025
DETAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification
DETAM: Defending LLMs Against Jailbreak Attacks via Targeted Attention Modification
Yu Li
Han Jiang
Zhihua Wei
AAML
29
0
0
18 Apr 2025
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
Jiahao Qiu
Yinghui He
Xinzhe Juan
Y. Wang
Y. Liu
Zixin Yao
Yue Wu
Xun Jiang
L. Yang
Mengdi Wang
AI4MH
65
0
0
13 Apr 2025
Align in Depth: Defending Jailbreak Attacks via Progressive Answer Detoxification
Yingjie Zhang
Tong Liu
Zhe Zhao
Guozhu Meng
Kai Chen
AAML
49
1
0
14 Mar 2025
Single-pass Detection of Jailbreaking Input in Large Language Models
Single-pass Detection of Jailbreaking Input in Large Language Models
Leyla Naz Candogan
Yongtao Wu
Elias Abad Rocamora
Grigorios G. Chrysos
V. Cevher
AAML
45
0
0
24 Feb 2025
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Ang Li
Yin Zhou
Vethavikashini Chithrra Raghuram
Tom Goldstein
Micah Goldblum
AAML
71
7
0
12 Feb 2025
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
Xiaoning Dong
Wenbo Hu
Wei Xu
Tianxing He
67
0
0
19 Dec 2024
Improved Large Language Model Jailbreak Detection via Pretrained
  Embeddings
Improved Large Language Model Jailbreak Detection via Pretrained Embeddings
Erick Galinkin
Martin Sablotny
68
0
0
02 Dec 2024
Quantized Delta Weight Is Safety Keeper
Quantized Delta Weight Is Safety Keeper
Yule Liu
Zhen Sun
Xinlei He
Xinyi Huang
80
2
0
29 Nov 2024
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
Jiawei Zhao
Kejiang Chen
W. Zhang
Nenghai Yu
AAML
38
0
0
03 Nov 2024
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data
VLMGuard: Defending VLMs against Malicious Prompts via Unlabeled Data
Xuefeng Du
Reshmi Ghosh
Robert Sim
Ahmed Salem
Vitor Carvalho
Emily Lawton
Yixuan Li
Jack W. Stokes
VLM
AAML
32
5
0
01 Oct 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su
Mingyu Lee
SangKeun Lee
30
7
0
02 Aug 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs
Sara Abdali
Jia He
C. Barberan
Richard Anarfi
29
7
0
30 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao Xu
Fan Liu
Hao Liu
AAML
35
7
0
13 Jun 2024
Merging Improves Self-Critique Against Jailbreak Attacks
Merging Improves Self-Critique Against Jailbreak Attacks
Victor Gallego
AAML
MoMe
34
3
0
11 Jun 2024
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
Fan Liu
Zhao Xu
Hao Liu
AAML
43
9
0
07 Jun 2024
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models
  and Their Defenses
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
AAML
60
28
0
03 Jun 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
38
32
0
31 May 2024
GenFighter: A Generative and Evolutive Textual Attack Removal
GenFighter: A Generative and Evolutive Textual Attack Removal
Md Athikul Islam
Edoardo Serra
Sushil Jajodia
AAML
14
0
0
17 Apr 2024
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
Yifan Zeng
Yiran Wu
Xiao Zhang
Huazheng Wang
Qingyun Wu
LLMAG
AAML
35
57
0
02 Mar 2024
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers
Divij Handa
Advait Chirmule
Bimal Gajera
Chitta Baral
Chitta Baral
42
18
0
16 Feb 2024
Intention Analysis Makes LLMs A Good Jailbreak Defender
Intention Analysis Makes LLMs A Good Jailbreak Defender
Yuqi Zhang
Liang Ding
Lefei Zhang
Dacheng Tao
LLMSV
17
15
0
12 Jan 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
1