Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2304.10436
Cited By
Safety Assessment of Chinese Large Language Models
20 April 2023
Hao Sun
Zhexin Zhang
Jiawen Deng
Jiale Cheng
Shiyu Huang
ALM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Safety Assessment of Chinese Large Language Models"
50 / 55 papers shown
Title
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
Xing Wang
Huiyuan Xie
Y. Wang
Chaojun Xiao
Huimin Chen
Holli Sargeant
Felix Steffek
Jie Shao
Zhiyuan Liu
Maosong Sun
AILaw
ELM
269
0
0
25 Nov 2025
ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
Siyang Cheng
Gaotian Liu
Rui Mei
Yilin Wang
Kejia Zhang
Kaishuo Wei
Yuqi Yu
Weiping Wen
Xiaojie Wu
Junhua Liu
40
0
0
17 Nov 2025
Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm
Yan Pang
Wenlong Meng
Xiaojing Liao
Tianhao Wang
153
2
0
08 Sep 2025
Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation
Yichi Zhang
Yao Huang
Yifan Wang
Yitong Sun
Chang-rui Liu
...
Xiao Yang
Xingxing Wei
Hang Su
Yinpeng Dong
Jun Zhu
122
1
0
21 Aug 2025
A Comprehensive Evaluation framework of Alignment Techniques for LLMs
Muneeza Azmat
Momin Abbas
M. Macedo
Marcelo Carpinette Grave
Luan Soares de Souza
...
Raya Horesh
Yixin Chen
Heloisa Caroline de Souza Pereira Candello
Rebecka Nordenlow
Aminat Adebiyi
OffRL
88
0
0
13 Aug 2025
Libra: Large Chinese-based Safeguard for AI Content
Ziyang Chen
Huimu Yu
Xing Wu
Dongqin Liu
Songlin Hu
AILaw
99
0
0
29 Jul 2025
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
Yanxu Mao
Tiehan Cui
Peipei Liu
Datao You
Hongsong Zhu
AAML
285
3
0
18 Jun 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAML
ELM
343
0
0
12 May 2025
Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories
Yujiao Shi
Qimeng Liu
Qiuchi Li
Peng Zhang
Jing Qin
AAML
233
1
0
28 Mar 2025
JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025
Shuyi Liu
Simiao Cui
Haoran Bu
Yuming Shang
Xi Zhang
ELM
168
2
0
26 Feb 2025
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Neural Information Processing Systems (NeurIPS), 2024
Yutao Mou
Shikun Zhang
Wei Ye
ELM
238
32
0
29 Oct 2024
CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
Zhihao Liu
Chenhui Hu
ALM
ELM
159
1
0
29 Oct 2024
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
Han Zhang
Hongfu Gao
Qiang Hu
Guanhua Chen
L. Yang
Bingyi Jing
Jianguo Huang
Bing Wang
Haifeng Bai
Lei Yang
AILaw
ELM
384
6
0
24 Oct 2024
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models
Hao Yang
Zhuang Li
Ehsan Shareghi
Gholamreza Haffari
AAML
168
8
0
15 Oct 2024
Harnessing Task Overload for Scalable Jailbreak Attacks on Large Language Models
Yiting Dong
Guobin Shen
Dongcheng Zhao
Xiang He
Yi Zeng
114
5
0
05 Oct 2024
Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step
Wenxuan Wang
Kuiyi Gao
Zihan Jia
Youliang Yuan
Shu Yang
S. Wang
Wenxiang Jiao
Zhaopeng Tu
761
7
0
04 Oct 2024
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
Sibo Yi
Yule Liu
Zhen Sun
Tianshuo Cong
Xinlei He
Jiaxing Song
Ke Xu
Qi Li
AAML
271
189
0
05 Jul 2024
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Seungju Han
Kavel Rao
Allyson Ettinger
Liwei Jiang
Bill Yuchen Lin
Nathan Lambert
Yejin Choi
Nouha Dziri
293
207
0
26 Jun 2024
Methodology of Adapting Large English Language Models for Specific Cultural Contexts
Wenjing Zhang
Siqi Xiao
Xuejiao Lei
Rongjia Du
Huazheng Zhang
Meijuan An
Bikun Yang
Zhaoxiang Liu
Kai Wang
Shiguo Lian
ALM
244
4
0
26 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELM
ALM
442
11
0
20 Jun 2024
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models
Wenjing Zhang
Xuejiao Lei
Zhaoxiang Liu
Meijuan An
Bikun Yang
Kaikai Zhao
Kai Wang
Shiguo Lian
ELM
242
10
0
14 Jun 2024
Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks
Zonghao Ying
Aishan Liu
Xianglong Liu
Dacheng Tao
296
38
0
10 Jun 2024
Large Language Models Meet NLP: A Survey
Libo Qin
Qiguang Chen
Xiachong Feng
Yang Wu
Yongheng Zhang
Hai-Tao Zheng
Min Li
Wanxiang Che
Philip S. Yu
LRM
ALM
LM&MA
ELM
407
113
0
21 May 2024
JailbreakLens: Visual Analysis of Jailbreak Attacks Against Large Language Models
Yingchaojie Feng
Zhizhang Chen
Zhining Kang
Sijia Wang
Haoyu Tian
Wei Zhang
Minfeng Zhu
Wei Chen
306
8
0
12 Apr 2024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Taishi Nakamura
Mayank Mishra
Simone Tedeschi
Yekun Chai
Jason T Stillerman
...
Virendra Mehta
Matthew Blumberg
Victor May
Huu Nguyen
S. Pyysalo
LRM
271
5
0
30 Mar 2024
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
European Conference on Computer Vision (ECCV), 2024
Yunhao Gou
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
James T. Kwok
Yu Zhang
MLLM
267
95
0
14 Mar 2024
Towards Proactive Interactions for In-Vehicle Conversational Assistants Utilizing Large Language Models
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Huifang Du
Xuejing Feng
Jun Ma
Meng Wang
Shiyu Tao
Yijie Zhong
Yuanzi Li
Haofen Wang
103
7
0
14 Mar 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
281
0
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Kushagra Pandey
Robert Bamler
Sina Daubener
...
Yixin Wang
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
696
40
0
28 Feb 2024
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review
Thilo Hagendorff
217
78
0
13 Feb 2024
Safety of Multimodal Large Language Models on Images and Texts
Xin Liu
Yichen Zhu
Yunshi Lan
Chao Yang
Yu Qiao
356
58
0
01 Feb 2024
Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective
Qun Ma
Xiao Xue
Deyu Zhou
Xiangning Yu
Donghua Liu
...
Yifan Shen
Peilin Ji
Juanjuan Li
Gang Wang
Wanpeng Ma
AI4CE
LM&Ro
LLMAG
202
14
0
01 Feb 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
...
Binglin Zhou
Fangqi Li
Zhuosheng Zhang
Rui Wang
Gongshen Liu
ELM
319
134
0
18 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
ZuJie Wen
Ke Xu
Qi Li
264
97
0
11 Jan 2024
Human-Instruction-Free LLM Self-Alignment with Limited Samples
Hongyi Guo
Yuanshun Yao
Wei Shen
Jiaheng Wei
Xiaoying Zhang
Zhaoran Wang
Yang Liu
244
29
0
06 Jan 2024
Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs
Zhuo Zhang
Guangyu Shen
Guanhong Tao
Shuyang Cheng
Xiangyu Zhang
245
21
0
08 Dec 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
351
514
0
19 Oct 2023
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
International Conference on Learning Representations (ICLR), 2023
Kai Chen
Chunwei Wang
Kuo Yang
Jianhua Han
Lanqing Hong
...
Zhenguo Li
Dit-Yan Yeung
Lifeng Shang
Xin Jiang
Qun Liu
501
44
0
16 Oct 2023
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
Qichen Ye
Junling Liu
Dading Chong
Peilin Zhou
Yining Hua
...
Meng Cao
Ziming Wang
Xuxin Cheng
Andrew Liu
Zhenhua Guo
AI4MH
LM&MA
ELM
208
29
0
13 Oct 2023
All Languages Matter: On the Multilingual Safety of Large Language Models
Wenxuan Wang
Zhaopeng Tu
Chang Chen
Youliang Yuan
Shu Yang
Wenxiang Jiao
Michael R. Lyu
ALM
LRM
179
39
0
02 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
812
486
0
19 Sep 2023
SafetyBench: Evaluating the Safety of Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhexin Zhang
Leqi Lei
Lindong Wu
Rui Sun
Yongkang Huang
Chong Long
Xiao Liu
Xuanyu Lei
Jie Tang
Shiyu Huang
LRM
LM&MA
ELM
234
161
0
13 Sep 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yongfeng Zhang
Xing Xie
ALM
337
56
0
23 Aug 2023
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
International Conference on Learning Representations (ICLR), 2023
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Shu Yang
Pinjia He
Shuming Shi
Zhaopeng Tu
SILM
236
376
0
12 Aug 2023
CLEVA: Chinese Language Models EVAluation Platform
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALM
ELM
279
15
0
09 Aug 2023
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
ELM
LLMAG
280
89
0
08 Aug 2023
RoCar: A Relationship Network-based Evaluation Method to Large Language Models
Ming Wang
Wenfang Wu
Chongyun Gao
Daling Wang
Shi Feng
Yifei Zhang
64
0
0
29 Jul 2023
MediaGPT : A Large Language Model For Chinese Media
Zhonghao Wang
Zijia Lu
Boshen Jin
Haiying Deng
LM&MA
201
1
0
20 Jul 2023
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALM
ELM
196
95
0
19 Jul 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Neural Information Processing Systems (NeurIPS), 2023
Jiaming Ji
Mickel Liu
Juntao Dai
Xuehai Pan
Chi Zhang
Ce Bian
Chi Zhang
Ruiyang Sun
Yizhou Wang
Yaodong Yang
ALM
299
692
0
10 Jul 2023
1
2
Next