ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11391
  4. Cited By
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation

A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation

19 May 2023
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
Changshun Wu
Saddek Bensalem
Ronghui Mu
Yi Qi
Xingyu Zhao
Kaiwen Cai
Yanghao Zhang
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
    ALM
ArXivPDFHTML

Papers citing "A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation"

50 / 78 papers shown
Title
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
Shuang Tian
Tao Zhang
J. Liu
Jiacheng Wang
Xuangou Wu
...
Ruichen Zhang
W. Zhang
Zhenhui Yuan
Shiwen Mao
Dong In Kim
48
0
0
22 Apr 2025
Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs
Wenzhuo Xu
Zhipeng Wei
Xiongtao Sun
Deyue Zhang
Dongdong Yang
Quanchen Zou
X. Zhang
AAML
47
0
0
10 Mar 2025
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Sophie Xhonneux
Simon Geisler
Gauthier Gidel
Leo Schwinn
Stephan Günnemann
ALM
ELM
90
0
0
04 Mar 2025
Reducing Large Language Model Safety Risks in Women's Health using Semantic Entropy
Jahan C. Penny-Dimri
Magdalena Bachmann
William Cooke
Sam Mathewlynn
Samuel Dockree
John Tolladay
Jannik Kossen
Lin Li
Y. Gal
Gabriel Davis Jones
27
0
0
01 Mar 2025
Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
John Burden
Marko Tesic
Lorenzo Pacchiardi
José Hernández Orallo
27
0
0
21 Feb 2025
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data
Jeremy Irvin
Emily Ruoyu Liu
Joyce Chuyi Chen
Ines Dormoy
Jinyoung Kim
Samar Khanna
Zhuo Zheng
Stefano Ermon
MLLM
VLM
48
4
0
28 Jan 2025
AIDBench: A benchmark for evaluating the authorship identification
  capability of large language models
AIDBench: A benchmark for evaluating the authorship identification capability of large language models
Zichen Wen
Dadi Guo
Huishuai Zhang
62
0
0
20 Nov 2024
Multimodal large language model for wheat breeding: a new exploration of
  smart breeding
Multimodal large language model for wheat breeding: a new exploration of smart breeding
Guofeng Yang
Yu Li
Yong He
Zhenjiang Zhou
Lingzhen Ye
Hui Fang
Yiqi Luo
Xuping Feng
64
2
0
20 Nov 2024
Large Language Model Supply Chain: Open Problems From the Security
  Perspective
Large Language Model Supply Chain: Open Problems From the Security Perspective
Q. Hu
Xiaofei Xie
Sen Chen
Lei Ma
ELM
39
0
0
03 Nov 2024
Standardization Trends on Safety and Trustworthiness Technology for
  Advanced AI
Standardization Trends on Safety and Trustworthiness Technology for Advanced AI
Jonghong Jeon
29
2
0
29 Oct 2024
Causal Abstraction in Model Interpretability: A Compact Survey
Causal Abstraction in Model Interpretability: A Compact Survey
Yihao Zhang
21
0
0
26 Oct 2024
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
H. Zhang
Hongfu Gao
Qiang Hu
Guanhua Chen
L. Yang
Bingyi Jing
Hongxin Wei
Bing Wang
Haifeng Bai
Lei Yang
AILaw
ELM
43
1
0
24 Oct 2024
Characterizing and Evaluating the Reliability of LLMs against Jailbreak
  Attacks
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks
Kexin Chen
Yi Liu
Dongxia Wang
Jiaying Chen
Wenhai Wang
34
1
0
18 Aug 2024
Blockchain for Large Language Model Security and Safety: A Holistic
  Survey
Blockchain for Large Language Model Security and Safety: A Holistic Survey
Caleb Geren
Amanda Board
Gaby G. Dagher
Tim Andersen
Jun Zhuang
44
5
0
26 Jul 2024
Building a Domain-specific Guardrail Model in Production
Building a Domain-specific Guardrail Model in Production
Mohammad Niknazar
Paul V Haley
Latha Ramanan
Sang T. Truong
Daricia Wilkinson
...
Robert Smith
Aditya Vempaty
Nick Haber
Sanmi Koyejo
Sharad Sundararajan
20
0
0
24 Jul 2024
LLM-Generated Tips Rival Expert-Created Tips in Helping Students Answer
  Quantum-Computing Questions
LLM-Generated Tips Rival Expert-Created Tips in Helping Students Answer Quantum-Computing Questions
L. Krupp
Jonas Bley
Isacco Gobbi
Alexander Geng
Sabine Müller
...
Artur Widera
Herwig Ott
P. Lukowicz
Jakob Karolus
Maximilian Kiefer-Emmanouilidis
22
3
0
24 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
29
12
0
06 Jul 2024
MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe
  Queries?
MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?
Xirui Li
Hengguang Zhou
Ruochen Wang
Tianyi Zhou
Minhao Cheng
Cho-Jui Hsieh
27
4
0
22 Jun 2024
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large
  Language Models
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models
Wenjing Zhang
Xuejiao Lei
Zhaoxiang Liu
Meijuan An
Bikun Yang
Kaikai Zhao
Kai Wang
Shiguo Lian
ELM
26
7
0
14 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
80
28
0
09 Jun 2024
Safeguarding Large Language Models: A Survey
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
27
17
0
03 Jun 2024
Towards Trustworthy AI: A Review of Ethical and Robust Large Language
  Models
Towards Trustworthy AI: A Review of Ethical and Robust Large Language Models
Meftahul Ferdaus
Mahdi Abdelguerfi
Elias Ioup
Kendall N. Niles
Ken Pathak
Steve Sloan
26
10
0
01 Jun 2024
Annotation-Efficient Preference Optimization for Language Model
  Alignment
Annotation-Efficient Preference Optimization for Language Model Alignment
Yuu Jinnai
Ukyo Honda
33
0
0
22 May 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
27
6
0
12 Apr 2024
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in
  Multimodal Large Language Model Security
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security
Yihe Fan
Yuxin Cao
Ziyu Zhao
Ziyao Liu
Shaofeng Li
27
11
0
08 Apr 2024
Zero-shot Safety Prediction for Autonomous Robots with Foundation World
  Models
Zero-shot Safety Prediction for Autonomous Robots with Foundation World Models
Zhenjiang Mao
Siqi Dai
Yuang Geng
Ivan Ruchkin
22
3
0
30 Mar 2024
Assessment of Multimodal Large Language Models in Alignment with Human
  Values
Assessment of Multimodal Large Language Models in Alignment with Human Values
Zhelun Shi
Zhipin Wang
Hongxing Fan
Zaibin Zhang
Lijun Li
Yongting Zhang
Zhen-fei Yin
Lu Sheng
Yu Qiao
Jing Shao
27
14
0
26 Mar 2024
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization
Xiangyu Yin
Wenjie Ruan
AAML
16
0
0
26 Mar 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
27
0
0
28 Feb 2024
Towards Fairness-Aware Adversarial Learning
Towards Fairness-Aware Adversarial Learning
Yanghao Zhang
Tianle Zhang
Ronghui Mu
Xiaowei Huang
Wenjie Ruan
16
4
0
27 Feb 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
23
48
0
20 Feb 2024
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu
Yugeng Liu
Ziqing Yang
Xinyue Shen
Michael Backes
Yang Zhang
AAML
20
65
0
08 Feb 2024
Building Guardrails for Large Language Models
Building Guardrails for Large Language Models
Yizhen Dong
Ronghui Mu
Gao Jin
Yi Qi
Jinwei Hu
Xingyu Zhao
Jie Meng
Wenjie Ruan
Xiaowei Huang
OffRL
57
23
0
02 Feb 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
...
Binglin Zhou
Fangqi Li
Zhuosheng Zhang
Rui Wang
Gongshen Liu
ELM
21
58
0
18 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
50
22
0
11 Jan 2024
A Comprehensive Study of Knowledge Editing for Large Language Models
A Comprehensive Study of Knowledge Editing for Large Language Models
Ningyu Zhang
Yunzhi Yao
Bo Tian
Peng Wang
Shumin Deng
...
Lei Liang
Zhiqiang Zhang
Xiao-Jun Zhu
Jun Zhou
Huajun Chen
KELM
21
76
0
02 Jan 2024
MetaAID 2.5: A Secure Framework for Developing Metaverse Applications
  via Large Language Models
MetaAID 2.5: A Secure Framework for Developing Metaverse Applications via Large Language Models
Hongyin Zhu
18
6
0
22 Dec 2023
A Comprehensive Survey of Attack Techniques, Implementation, and
  Mitigation Strategies in Large Language Models
A Comprehensive Survey of Attack Techniques, Implementation, and Mitigation Strategies in Large Language Models
Aysan Esmradi
Daniel Wankit Yip
C. Chan
AAML
12
11
0
18 Dec 2023
User Modeling in the Era of Large Language Models: Current Research and
  Future Directions
User Modeling in the Era of Large Language Models: Current Research and Future Directions
Zhaoxuan Tan
Meng-Long Jiang
11
8
0
11 Dec 2023
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model
  Qualities
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities
Sangwon Hyun
Mingyu Guo
Muhammad Ali Babar
12
0
0
11 Dec 2023
Trustworthy Large Models in Vision: A Survey
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Li Xu
Jun Liu
MU
56
0
0
16 Nov 2023
LUNA: A Model-Based Universal Analysis Framework for Large Language
  Models
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Da Song
Xuan Xie
Jiayang Song
Derui Zhu
Yuheng Huang
Felix Juefei Xu
Lei Ma
ALM
14
1
0
22 Oct 2023
Unsupervised Pretraining for Fact Verification by Language Model
  Distillation
Unsupervised Pretraining for Fact Verification by Language Model Distillation
A. Bazaga
Pietro Lió
Bo Dai
HILM
17
2
0
28 Sep 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and
  Vulnerabilities
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
25
69
0
24 Aug 2023
Synergistic Integration of Large Language Models and Cognitive
  Architectures for Robust AI: An Exploratory Analysis
Synergistic Integration of Large Language Models and Cognitive Architectures for Robust AI: An Exploratory Analysis
Oscar J. Romero
John Zimmerman
Aaron Steinfeld
A. Tomasic
LLMAG
LM&Ro
13
15
0
18 Aug 2023
Evaluating the Instruction-Following Robustness of Large Language Models
  to Prompt Injection
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Zekun Li
Baolin Peng
Pengcheng He
Xifeng Yan
ELM
SILM
AAML
20
22
0
17 Aug 2023
What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled
  Safety Critical Systems
What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems
Saddek Bensalem
Chih-Hong Cheng
Wei Huang
Xiaowei Huang
Changshun Wu
Xingyu Zhao
AAML
6
6
0
20 Jul 2023
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
Yuheng Huang
Jiayang Song
Zhijie Wang
Shengming Zhao
Huaming Chen
Felix Juefei-Xu
Lei Ma
20
34
0
16 Jul 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
163
388
0
02 May 2023
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale
  Instructions
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Minghao Wu
Abdul Waheed
Chiyu Zhang
Muhammad Abdul-Mageed
Alham Fikri Aji
ALM
118
115
0
27 Apr 2023
12
Next