ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.07557
  4. Cited By
JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation

JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation

11 February 2025
Shenyi Zhang
Yuchen Zhai
Keyan Guo
Hongxin Hu
Shengnan Guo
Zheng Fang
Lingchen Zhao
Chao Shen
Cong Wang
Qian Wang
    AAML
ArXiv (abs)PDFHTML

Papers citing "JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation"

19 / 19 papers shown
From static to adaptive: immune memory-based jailbreak detection for large language models
From static to adaptive: immune memory-based jailbreak detection for large language models
Jun Leng
Litian Zhang
Xi Zhang
Ruihan Hu
Zhuting Fang
Xi Zhang
AAML
223
0
0
03 Dec 2025
Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion
Can LLMs Threaten Human Survival? Benchmarking Potential Existential Threats from LLMs via Prefix Completion
Yu Cui
Yifei Liu
Hang Fu
Sicheng Pan
Haibin Zhang
Cong Zuo
Licheng Wang
183
0
0
24 Nov 2025
ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
Siyang Cheng
Gaotian Liu
Rui Mei
Yilin Wang
Kejia Zhang
Kaishuo Wei
Yuqi Yu
Weiping Wen
Xiaojie Wu
Junhua Liu
66
0
0
17 Nov 2025
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
Yunhao Chen
Xin Wang
Juncheng Li
Yixu Wang
Jie Li
Yan Teng
Yingchun Wang
Xingjun Ma
AAML
284
1
0
16 Nov 2025
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
AlignTree: Efficient Defense Against LLM Jailbreak Attacks
Gil Goren
Shahar Katz
Lior Wolf
AAML
199
1
0
15 Nov 2025
Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks
Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks
Md. Mehedi Hasan
Ziaur Rahman
Rafid Mostafiz
Md. Abir Hossain
AAML
135
0
0
26 Oct 2025
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization
Masahiro Kaneko
Zeerak Talat
Timothy Baldwin
AAML
147
2
0
19 Oct 2025
Bits Leaked per Query: Information-Theoretic Bounds on Adversarial Attacks against LLMs
Bits Leaked per Query: Information-Theoretic Bounds on Adversarial Attacks against LLMs
Masahiro Kaneko
Timothy Baldwin
AAML
193
0
0
19 Oct 2025
SASER: Stego attacks on open-source LLMs
SASER: Stego attacks on open-source LLMs
Ming Tan
Wei Li
Hu Tao
Hailong Ma
Aodi Liu
Qian Chen
Zilong Wang
AAML
171
0
0
12 Oct 2025
HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing
HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing
Yukai Zhao
Menghan Wu
Xing Hu
Xin Xia
HILM
178
0
0
28 Sep 2025
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Han Yan
Zheyuan Liu
Meng Jiang
MUAAML
117
0
0
27 Sep 2025
Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain
Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain
Shakiba Amirshahi
Amin Bigdeli
Charles L. A. Clarke
Amira Ghenai
AAML
145
2
0
04 Sep 2025
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs
Hiroshi Matsuda
Chunpeng Ma
Masayuki Asahara
318
3
0
11 Jun 2025
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David Wagner
AAML
318
3
0
28 Apr 2025
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
Weixiang Zhao
Jiahe Guo
Yulin Hu
Yang Deng
An Zhang
...
Xinyang Han
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
LLMSVAAML
386
11
0
13 Apr 2025
Beyond Prompts: Space-Time Decoupling Control-Plane Jailbreaks in LLM Structured Output
Beyond Prompts: Space-Time Decoupling Control-Plane Jailbreaks in LLM Structured Output
Shuoming Zhang
Jiacheng Zhao
Ruiyuan Xu
Xiaobing Feng
Huimin Cui
...
Yuan Wen
Chunwei Xia
Zheng Wang
Xiaobing Feng
Huimin Cui
AAML
393
8
0
31 Mar 2025
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie
Xiangyu Qi
Yi Zeng
Yangsibo Huang
Udari Madhushani Sehwag
...
Bo Li
Kai Li
Danqi Chen
Peter Henderson
Prateek Mittal
ALMELM
426
139
0
20 Jun 2024
OR-Bench: An Over-Refusal Benchmark for Large Language Models
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
736
97
0
31 May 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive AttacksInternational Conference on Learning Representations (ICLR), 2024
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
793
374
0
02 Apr 2024
1
Page 1 of 1