Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2304.10436
Cited By
Safety Assessment of Chinese Large Language Models
20 April 2023
Hao Sun
Zhexin Zhang
Jiawen Deng
Jiale Cheng
Shiyu Huang
ALM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Safety Assessment of Chinese Large Language Models"
50 / 59 papers shown
Title
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
Xing Wang
Huiyuan Xie
Y. Wang
Chaojun Xiao
Huimin Chen
Holli Sargeant
Felix Steffek
Jie Shao
Zhiyuan Liu
Maosong Sun
AILaw
ELM
281
0
0
25 Nov 2025
ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
Siyang Cheng
Gaotian Liu
Rui Mei
Yilin Wang
Kejia Zhang
Kaishuo Wei
Yuqi Yu
Weiping Wen
Xiaojie Wu
Junhua Liu
40
0
0
17 Nov 2025
LiveSecBench: A Dynamic and Event-Driven Safety Benchmark for Chinese Language Model Applications
Yudong Li
Zhongliang Yang
Kejiang Chen
Wenxuan Wang
TianXin Zhang
...
Xingchi Gu
Peiru Yang
Tianxin Zhang
Yue Gao
Yongfeng Huang
ELM
194
0
0
04 Nov 2025
EPT Benchmark: Evaluation of Persian Trustworthiness in Large Language Models
Mohammad Reza Mirbagheri
Mohammad Mahdi Mirkamali
Zahra Motoshaker Arani
Ali Javeri
A. M. Sadeghzadeh
R. Jalili
HILM
162
0
0
08 Sep 2025
Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm
Yan Pang
Wenlong Meng
Xiaojing Liao
Tianhao Wang
153
2
0
08 Sep 2025
Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation
Yichi Zhang
Yao Huang
Yifan Wang
Yitong Sun
Chang-rui Liu
...
Xiao Yang
Xingxing Wei
Hang Su
Yinpeng Dong
Jun Zhu
134
1
0
21 Aug 2025
A Comprehensive Evaluation framework of Alignment Techniques for LLMs
Muneeza Azmat
Momin Abbas
M. Macedo
Marcelo Carpinette Grave
Luan Soares de Souza
...
Raya Horesh
Yixin Chen
Heloisa Caroline de Souza Pereira Candello
Rebecka Nordenlow
Aminat Adebiyi
OffRL
100
0
0
13 Aug 2025
Libra: Large Chinese-based Safeguard for AI Content
Ziyang Chen
Huimu Yu
Xing Wu
Dongqin Liu
Songlin Hu
AILaw
103
0
0
29 Jul 2025
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
Yanxu Mao
Tiehan Cui
Peipei Liu
Datao You
Hongsong Zhu
AAML
301
3
0
18 Jun 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAML
ELM
355
0
0
12 May 2025
Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories
Yujiao Shi
Qimeng Liu
Qiuchi Li
Peng Zhang
Jing Qin
AAML
241
1
0
28 Mar 2025
TIB-STC: A Large-Scale Structured Tibetan Benchmark for Low-Resource Language Modeling
Cheng Huang
Fan Gao
Nyima Tashi
Yutong Liu
Xiangxiang Wang
...
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Hao Wang
Yongbin Yu
ALM
291
2
0
24 Mar 2025
JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025
Shuyi Liu
Simiao Cui
Haoran Bu
Yuming Shang
Xi Zhang
ELM
172
2
0
26 Feb 2025
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Neural Information Processing Systems (NeurIPS), 2024
Yutao Mou
Shikun Zhang
Wei Ye
ELM
238
32
0
29 Oct 2024
CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
Zhihao Liu
Chenhui Hu
ALM
ELM
159
1
0
29 Oct 2024
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
Han Zhang
Hongfu Gao
Qiang Hu
Guanhua Chen
L. Yang
Bingyi Jing
Jianguo Huang
Bing Wang
Haifeng Bai
Lei Yang
AILaw
ELM
400
6
0
24 Oct 2024
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models
Hao Yang
Zhuang Li
Ehsan Shareghi
Gholamreza Haffari
AAML
172
8
0
15 Oct 2024
Harnessing Task Overload for Scalable Jailbreak Attacks on Large Language Models
Yiting Dong
Guobin Shen
Dongcheng Zhao
Xiang He
Yi Zeng
134
5
0
05 Oct 2024
Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step
Wenxuan Wang
Kuiyi Gao
Zihan Jia
Youliang Yuan
Shu Yang
S. Wang
Wenxiang Jiao
Zhaopeng Tu
781
7
0
04 Oct 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
275
190
0
05 Jul 2024
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Seungju Han
Kavel Rao
Allyson Ettinger
Liwei Jiang
Bill Yuchen Lin
Nathan Lambert
Yejin Choi
Nouha Dziri
325
210
0
26 Jun 2024
Methodology of Adapting Large English Language Models for Specific Cultural Contexts
Wenjing Zhang
Siqi Xiao
Xuejiao Lei
Rongjia Du
Huazheng Zhang
Meijuan An
Bikun Yang
Zhaoxiang Liu
Kai Wang
Shiguo Lian
ALM
248
4
0
26 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELM
ALM
498
11
0
20 Jun 2024
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models
Wenjing Zhang
Xuejiao Lei
Zhaoxiang Liu
Meijuan An
Bikun Yang
Kaikai Zhao
Kai Wang
Shiguo Lian
ELM
250
10
0
14 Jun 2024
Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks
Zonghao Ying
Aishan Liu
Xianglong Liu
Dacheng Tao
304
38
0
10 Jun 2024
Large Language Models Meet NLP: A Survey
Libo Qin
Qiguang Chen
Xiachong Feng
Yang Wu
Yongheng Zhang
Hai-Tao Zheng
Min Li
Wanxiang Che
Philip S. Yu
LRM
ALM
LM&MA
ELM
415
113
0
21 May 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Haoyu Tian
Wei Zhang
Minfeng Zhu
Wei Chen
306
8
0
12 Apr 2024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Taishi Nakamura
Mayank Mishra
Simone Tedeschi
Yekun Chai
Jason T Stillerman
...
Virendra Mehta
Matthew Blumberg
Victor May
Huu Nguyen
S. Pyysalo
LRM
279
5
0
30 Mar 2024
Exploring the Privacy Protection Capabilities of Chinese Large Language Models
Yuqi Yang
Xiaowen Huang
Jitao Sang
ELM
PILM
AILaw
193
1
0
27 Mar 2024
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
European Conference on Computer Vision (ECCV), 2024
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MLLM
283
96
0
14 Mar 2024
Towards Proactive Interactions for In-Vehicle Conversational Assistants Utilizing Large Language Models
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Huifang Du
Xuejing Feng
Jun Ma
Meng Wang
Shiyu Tao
Yijie Zhong
Yuanzi Li
Haofen Wang
107
8
0
14 Mar 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
301
0
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Kushagra Pandey
Robert Bamler
Sina Daubener
...
Yixin Wang
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
720
40
0
28 Feb 2024
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
Thilo Hagendorff
221
78
0
13 Feb 2024
Safety of Multimodal Large Language Models on Images and Texts
Xin Liu
Yichen Zhu
Yunshi Lan
Chao Yang
Yu Qiao
380
59
0
01 Feb 2024
Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective
Qun Ma
Xiao Xue
Deyu Zhou
Xiangning Yu
Donghua Liu
...
Yifan Shen
Peilin Ji
Juanjuan Li
Gang Wang
Wanpeng Ma
AI4CE
LM&Ro
LLMAG
214
14
0
01 Feb 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
...
Binglin Zhou
Fangqi Li
Zhuosheng Zhang
Rui Wang
Gongshen Liu
ELM
327
135
0
18 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
ZuJie Wen
Ke Xu
Qi Li
268
87
0
11 Jan 2024
Human-Instruction-Free LLM Self-Alignment with Limited Samples
Hongyi Guo
Yuanshun Yao
Wei Shen
Jiaheng Wei
Xiaoying Zhang
Zhaoran Wang
Yang Liu
252
29
0
06 Jan 2024
Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs
Zhuo Zhang
Guangyu Shen
Guanhong Tao
Shuyang Cheng
Xiangyu Zhang
253
21
0
08 Dec 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
367
516
0
19 Oct 2023
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
International Conference on Learning Representations (ICLR), 2023
Kai Chen
Chunwei Wang
Kuo Yang
Jianhua Han
Lanqing Hong
...
Zhenguo Li
Dit-Yan Yeung
Lifeng Shang
Xin Jiang
Qun Liu
521
44
0
16 Oct 2023
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
Qichen Ye
Junling Liu
Dading Chong
Peilin Zhou
Yining Hua
...
Meng Cao
Ziming Wang
Xuxin Cheng
Andrew Liu
Zhenhua Guo
AI4MH
LM&MA
ELM
220
29
0
13 Oct 2023
All Languages Matter: On the Multilingual Safety of Large Language Models
Wenxuan Wang
Zhaopeng Tu
Chang Chen
Youliang Yuan
Shu Yang
Wenxiang Jiao
Michael R. Lyu
ALM
LRM
203
39
0
02 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
840
491
0
19 Sep 2023
SafetyBench: Evaluating the Safety of Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhexin Zhang
Leqi Lei
Lindong Wu
Rui Sun
Yongkang Huang
Chong Long
Xiao Liu
Xuanyu Lei
Jie Tang
Shiyu Huang
LRM
LM&MA
ELM
246
164
0
13 Sep 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yongfeng Zhang
Xing Xie
ALM
345
56
0
23 Aug 2023
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
International Conference on Learning Representations (ICLR), 2023
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Shu Yang
Pinjia He
Shuming Shi
Zhaopeng Tu
SILM
240
377
0
12 Aug 2023
CLEVA: Chinese Language Models EVAluation Platform
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALM
ELM
291
15
0
09 Aug 2023
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
ELM
LLMAG
296
89
0
08 Aug 2023
1
2
Next