ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07865
  4. Cited By
CodeAttack: Revealing Safety Generalization Challenges of Large Language
  Models via Code Completion

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

12 March 2024
Qibing Ren
Chang Gao
Jing Shao
Junchi Yan
Xin Tan
Wai Lam
Lizhuang Ma
    ALM
    ELM
    AAML
ArXivPDFHTML

Papers citing "CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion"

6 / 6 papers shown
Title
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng
Xiaolong Jin
Jinyuan Jia
X. Zhang
AAML
38
0
0
27 Feb 2025
Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models
Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models
Qingsong Zou
Jingyu Xiao
Qing Li
Zhi Yan
Y. Wang
Li Xu
Wenxuan Wang
Kuofeng Gao
Ruoyu Li
Yong-jia Jiang
AAML
76
0
0
21 Feb 2025
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
Jiawei Zhao
Kejiang Chen
W. Zhang
Nenghai Yu
AAML
36
0
0
03 Nov 2024
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
213
327
0
23 Aug 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Gradient-based Adversarial Attacks against Text Transformers
Gradient-based Adversarial Attacks against Text Transformers
Chuan Guo
Alexandre Sablayrolles
Hervé Jégou
Douwe Kiela
SILM
93
225
0
15 Apr 2021
1