ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.02446
  4. Cited By
Low-Resource Languages Jailbreak GPT-4

Low-Resource Languages Jailbreak GPT-4

3 October 2023
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
    SILM
ArXivPDFHTML

Papers citing "Low-Resource Languages Jailbreak GPT-4"

23 / 23 papers shown
Title
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
JailbreaksOverTime: Detecting Jailbreak Attacks Under Distribution Shift
Julien Piet
Xiao Huang
Dennis Jacob
Annabella Chow
Maha Alrashed
Geng Zhao
Zhanhao Hu
Chawin Sitawarin
Basel Alomair
David A. Wagner
AAML
63
0
0
28 Apr 2025
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
Y. Li
Qizhi Pei
Mengyuan Sun
Honglin Lin
Chenlin Ming
Xin Gao
Jiang Wu
C. He
Lijun Wu
ELM
LRM
40
0
0
27 Apr 2025
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Shreyan Biswas
Alexander Erlei
U. Gadiraju
103
4
0
13 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
...
Zikui Cai
Bilal Chughtai
Y. Gal
Furong Huang
Dylan Hadfield-Menell
MU
AAML
ELM
83
3
0
03 Feb 2025
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
75
44
0
20 Jan 2025
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
73
0
0
12 Nov 2024
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao
Xiang Zheng
Lin Luo
Yige Li
Xingjun Ma
Yu-Gang Jiang
VLM
AAML
55
3
0
28 Oct 2024
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
H. Zhang
Hongfu Gao
Qiang Hu
Guanhua Chen
L. Yang
Bingyi Jing
Hongxin Wei
Bing Wang
Haifeng Bai
Lei Yang
AILaw
ELM
49
1
0
24 Oct 2024
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
Samuele Poppi
Zheng-Xin Yong
Yifei He
Bobbie Chern
Han Zhao
Aobo Yang
Jianfeng Chi
AAML
45
12
0
23 Oct 2024
Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis
Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis
Yiyi Chen
Qiongxiu Li
Russa Biswas
Johannes Bjerva
34
1
0
17 Oct 2024
Endless Jailbreaks with Bijection Learning
Endless Jailbreaks with Bijection Learning
Brian R. Y. Huang
Maximilian Li
Leonard Tang
AAML
73
5
0
02 Oct 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
52
1
0
05 Sep 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
56
12
0
13 Jun 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
71
8
0
08 Jun 2024
Voice Jailbreak Attacks Against GPT-4o
Voice Jailbreak Attacks Against GPT-4o
Xinyue Shen
Yixin Wu
Michael Backes
Yang Zhang
AuLLM
31
9
0
29 May 2024
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Chia-Yi Hsu
Yu-Lin Tsai
Chih-Hsun Lin
Pin-Yu Chen
Chia-Mu Yu
Chun-ying Huang
44
31
0
27 May 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language
  Models
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
Luiza Amador Pozzobon
Patrick Lewis
Sara Hooker
B. Ermiş
36
7
0
06 Mar 2024
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu
Yugeng Liu
Ziqing Yang
Xinyue Shen
Michael Backes
Yang Zhang
AAML
33
65
0
08 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
13
76
0
25 Jan 2024
Exploiting Novel GPT-4 APIs
Exploiting Novel GPT-4 APIs
Kellin Pelrine
Mohammad Taufeeque
Michal Zajkac
Euan McLean
Adam Gleave
SILM
18
20
0
21 Dec 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
Systematic Inequalities in Language Technology Performance across the
  World's Languages
Systematic Inequalities in Language Technology Performance across the World's Languages
Damián E. Blasi
Antonios Anastasopoulos
Graham Neubig
111
131
0
13 Oct 2021
1