Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2307.09705
Cited By
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
19 July 2023
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
Zhuoran Zhou
Peng Yi
Xing Gao
Jitao Sang
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility"
50 / 63 papers shown
Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts
Xing Wang
Huiyuan Xie
Y. Wang
Chaojun Xiao
Huimin Chen
Holli Sargeant
Felix Steffek
Jie Shao
Zhiyuan Liu
Maosong Sun
AILaw
ELM
327
0
0
25 Nov 2025
Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization
Xurui Li
Kaisong Song
Rui Zhu
Pin-Yu Chen
Haixu Tang
AAML
449
1
0
24 Nov 2025
LiveSecBench: A Dynamic and Event-Driven Safety Benchmark for Chinese Language Model Applications
Yudong Li
Zhongliang Yang
Kejiang Chen
Wenxuan Wang
TianXin Zhang
...
Xingchi Gu
Peiru Yang
Tianxin Zhang
Yue Gao
Yongfeng Huang
ELM
239
0
0
04 Nov 2025
EPT Benchmark: Evaluation of Persian Trustworthiness in Large Language Models
Mohammad Reza Mirbagheri
Mohammad Mahdi Mirkamali
Zahra Motoshaker Arani
Ali Javeri
A. M. Sadeghzadeh
R. Jalili
HILM
194
0
0
08 Sep 2025
Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives
Wei Zeng
Hengshu Zhu
Chuan Qin
Han Wu
Yihang Cheng
...
Xiaowei Jin
Yinuo Shen
Zhenxing Wang
Feimin Zhong
Hui Xiong
AI4TS
434
0
0
11 Jun 2025
MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Bohan Jin
Shuhan Qi
Kehai Chen
Xinyi Guo
Xuan Wang
207
1
0
22 May 2025
From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs
Muhammad Farid Adilazuarda
Chen Cecilia Liu
Iryna Gurevych
Alham Fikri Aji
470
0
0
22 May 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAML
ELM
406
0
0
12 May 2025
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
Chengkai Huang
Hongtao Huang
Tong Yu
Kaige Xie
Junda Wu
Shuai Zhang
Julian McAuley
Dietmar Jannach
Lina Yao
LRM
AI4CE
302
7
0
23 Apr 2025
Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories
Yujiao Shi
Qimeng Liu
Qiuchi Li
Peng Zhang
Jing Qin
AAML
265
1
0
28 Mar 2025
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs
Ling Team
B. Zeng
Chenyu Huang
Chao Zhang
Changxin Tian
...
Zhaoxin Huan
Zujie Wen
Zhenhang Sun
Zhuoxuan Du
Z. He
MoE
ALM
408
13
0
07 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
997
5
0
03 Mar 2025
JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025
Shuyi Liu
Simiao Cui
Haoran Bu
Yuming Shang
Xi Zhang
ELM
200
2
0
26 Feb 2025
CHBench: A Chinese Dataset for Evaluating Health in Large Language Models
Chenlu Guo
Nuo Xu
Yi-Ju Chang
Yuan Wu
AI4MH
LM&MA
294
2
0
24 Feb 2025
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Rui Li
Peiyi Wang
Jingyuan Ma
Di Zhang
Lei Sha
Lei Sha
LLMAG
319
0
0
22 Feb 2025
Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models
Hao Yang
Zhuang Li
Ehsan Shareghi
Gholamreza Haffari
AAML
236
16
0
31 Oct 2024
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models
Hao Yang
Zhuang Li
Ehsan Shareghi
Gholamreza Haffari
AAML
212
9
0
15 Oct 2024
FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Mingye Zhu
Yi Liu
Quan Wang
Junbo Guo
Zhendong Mao
175
2
0
01 Oct 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
The Web Conference (WWW), 2024
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4Ed
ELM
369
6
0
19 Sep 2024
Can Large Language Models Understand Symbolic Graphics Programs?
International Conference on Learning Representations (ICLR), 2024
Zeju Qiu
Weiyang Liu
Haiwen Feng
Zhen Liu
Tim Z. Xiao
Katherine M. Collins
J. Tenenbaum
Adrian Weller
Michael J. Black
Bernhard Schölkopf
601
28
0
15 Aug 2024
Know Your Limits: A Survey of Abstention in Large Language Models
Bingbing Wen
Jihan Yao
Shangbin Feng
Chenjun Xu
Yulia Tsvetkov
Bill Howe
Lucy Lu Wang
519
5
0
25 Jul 2024
SAFETY-J: Evaluating Safety with Critique
Yixiu Liu
Yuxiang Zheng
Shijie Xia
Jiajun Li
Yi Tu
Chaoling Song
Pengfei Liu
ELM
206
2
0
24 Jul 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao
Xiaoyuan Yi
Xing Xie
ELM
ALM
286
22
0
15 Jul 2024
YuLan: An Open-source Large Language Model
Yutao Zhu
Kun Zhou
Kelong Mao
Wentong Chen
Yiding Sun
...
Wenbing Huang
Ze-Feng Gao
Yueguo Chen
Weizheng Lu
Ji-Rong Wen
ALM
ELM
156
2
0
28 Jun 2024
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights
Hao Yang
Zhuang Li
Ehsan Shareghi
Gholamreza Haffari
155
1
0
25 Jun 2024
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
Siyuan Wang
Zhuohan Long
Zhihao Fan
Zhongyu Wei
220
21
0
21 Jun 2024
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective
Yuchen Wen
Keping Bi
Wei Chen
Jiafeng Guo
Xueqi Cheng
520
6
0
20 Jun 2024
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models
Wenjing Zhang
Xuejiao Lei
Zhaoxiang Liu
Meijuan An
Bikun Yang
Kaikai Zhao
Kai Wang
Shiguo Lian
ELM
279
10
0
14 Jun 2024
A Survey of Useful LLM Evaluation
Ji-Lun Peng
Sijia Cheng
Egil Diau
Yung-Yu Shih
Po-Heng Chen
Yen-Ting Lin
Yun-Nung Chen
LLMAG
ELM
289
32
0
03 Jun 2024
CulturePark: Boosting Cross-cultural Understanding in Large Language Models
Cheng-rong Li
Damien Teney
Linyi Yang
Qingsong Wen
Xing Xie
Yongfeng Zhang
209
17
0
24 May 2024
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Neural Information Processing Systems (NeurIPS), 2024
Jingnan Zheng
Han Wang
An Zhang
Tai D. Nguyen
Jun Sun
Tat-Seng Chua
LLMAG
357
39
0
23 May 2024
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models
Giada Pistilli
Alina Leidinger
Yacine Jernite
Atoosa Kasirzadeh
A. Luccioni
Margaret Mitchell
310
6
0
22 May 2024
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Zhaofeng Wu
Ananth Balashankar
Yoon Kim
Jacob Eisenstein
Ahmad Beirami
250
25
0
18 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
364
66
0
08 Apr 2024
Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs
Shu Yang
Jiayuan Su
Han Jiang
Mengdi Li
Keyuan Cheng
Muhammad Asif Ali
Lijie Hu
Haiyan Zhao
278
8
0
30 Mar 2024
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
Yuelin Bai
Xinrun Du
Yiming Liang
Yonggang Jin
Ziqiang Liu
...
Chenghua Lin
Jie Fu
Min Yang
Shiwen Ni
Ge Zhang
ALM
160
51
0
26 Mar 2024
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic
International Conference on Computational Linguistics (COLING), 2024
Emad A. Alghamdi
Reem I. Masoud
Deema Alnuhait
Afnan Y. Alomairi
Ahmed Ashraf
Mohamed Zaytoon
259
13
0
14 Mar 2024
NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
Miao Li
Ming-Bin Chen
Shichao Song
ShengbinHou ShengbinHou
Pengyu Wang
...
Zhiyu Li
Feiyu Xiong
Keming Mao
Cheng Peng
Yi Luo
ELM
144
6
0
29 Feb 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
332
0
0
28 Feb 2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
Paul Röttger
Valentin Hofmann
Valentina Pyatkin
Musashi Hinck
Hannah Rose Kirk
Hinrich Schütze
Dirk Hovy
ELM
272
126
0
26 Feb 2024
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Zhexin Zhang
Yida Lu
Jingyuan Ma
Di Zhang
Rui Li
...
Hao Sun
Lei Sha
Zhifang Sui
Hongning Wang
Shiyu Huang
126
47
0
26 Feb 2024
ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
LM&MA
270
30
0
19 Feb 2024
Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement
Yihong Tang
Jiao Ou
Che Liu
Fuzheng Zhang
Chen Zhang
Kun Gai
270
6
0
16 Feb 2024
CultureLLM: Incorporating Cultural Differences into Large Language Models
Cheng-rong Li
Mengzhou Chen
Yongfeng Zhang
Sunayana Sitaram
Xing Xie
VLM
313
48
0
09 Feb 2024
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
Lijun Li
Bowen Dong
Ruohui Wang
Xuhao Hu
Wangmeng Zuo
Dahua Lin
Yu Qiao
Jing Shao
ELM
326
170
0
07 Feb 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
ZuJie Wen
Ke Xu
Qi Li
316
99
0
11 Jan 2024
MetaAID 2.5: A Secure Framework for Developing Metaverse Applications via Large Language Models
Hongyin Zhu
212
7
0
22 Dec 2023
The Good, The Bad, and Why: Unveiling Emotions in Generative AI
Cheng-rong Li
Yongfeng Zhang
Yixuan Zhang
Lingyao Li
Xinyi Wang
Wenxin Hou
Jianxun Lian
Fang Luo
Qiang Yang
Xing Xie
LLMAG
459
22
0
18 Dec 2023
CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models
Yuhang Wang
Yanxu Zhu
Chao Kong
Shuyu Wei
Xiaoyuan Yi
Xing Xie
Jitao Sang
ALM
VLM
ELM
168
16
0
28 Nov 2023
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yifan Gong
Xing Xie
299
40
0
15 Nov 2023
1
2
Next