Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.09447
Cited By
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
15 November 2023
Lingbo Mo
Boshi Wang
Muhao Chen
Huan Sun
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities"
23 / 23 papers shown
Title
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
Paul Martin
Sarah Mercer
69
0
0
26 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
76
0
0
03 Mar 2025
The Impact of Inference Acceleration on Bias of LLMs
Elisabeth Kirsten
Ivan Habernal
Vedant Nanda
Muhammad Bilal Zafar
33
0
0
20 Feb 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
71
0
0
08 Jan 2025
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou
Shikun Zhang
Wei Ye
ELM
33
5
0
29 Oct 2024
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
Yujun Zhou
Jingdong Yang
Kehan Guo
Pin-Yu Chen
Tian Gao
...
Tian Gao
Werner Geyer
Nuno Moniz
Nitesh V Chawla
Xiangliang Zhang
33
4
0
18 Oct 2024
Large Language Models can Achieve Social Balance
Pedro Cisneros-Velarde
37
1
0
05 Oct 2024
AI Delegates with a Dual Focus: Ensuring Privacy and Strategic Self-Disclosure
Xi Chen
Zhiyang Zhang
Fangkai Yang
Xiaoting Qin
Chao Du
...
Hangxin Liu
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
23
1
0
26 Sep 2024
FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation
Kashun Shum
Minrui Xu
Jianshu Zhang
Zixin Chen
Shizhe Diao
Hanze Dong
Jipeng Zhang
Muhammad Omer Raza
19
3
0
22 Aug 2024
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Haibo Jin
Leyang Hu
Xinuo Li
Peiyan Zhang
Chonghan Chen
Jun Zhuang
Haohan Wang
PILM
36
26
0
26 Jun 2024
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Jingnan Zheng
Han Wang
An Zhang
Tai D. Nguyen
Jun Sun
Tat-Seng Chua
LLMAG
33
13
0
23 May 2024
AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
Zeyi Liao
Huan Sun
AAML
39
72
0
11 Apr 2024
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong
Jinhao Duan
Chenhui Zhang
Zhangheng Li
Chulin Xie
...
B. Kailkhura
Dan Hendrycks
Dawn Song
Zhangyang Wang
Bo-wen Li
34
24
0
18 Mar 2024
AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic
Emad A. Alghamdi
Reem I. Masoud
Deema Alnuhait
Afnan Y. Alomairi
Ahmed Ashraf
Mohamed Zaytoon
27
4
0
14 Mar 2024
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
Jio Oh
Soyeon Kim
Junseok Seo
Jindong Wang
Ruochen Xu
Xing Xie
Steven Euijong Whang
36
1
0
08 Mar 2024
InSaAF: Incorporating Safety through Accuracy and Fairness | Are LLMs ready for the Indian Legal Domain?
Yogesh Tripathi
Raghav Donakanti
Sahil Girhepuje
Ishan Kavathekar
Bhaskara Hanuma Vedula
Gokul S Krishnan
Shreya Goyal
Anmol Goel
Balaraman Ravindran
Ponnurangam Kumaraguru
ALM
AILaw
ELM
9
1
0
16 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
36
14
0
15 Feb 2024
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn
David Dobre
Sophie Xhonneux
Gauthier Gidel
Stephan Gunnemann
AAML
44
36
0
14 Feb 2024
Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks
Haz Sameen Shahgir
Xianghao Kong
Greg Ver Steeg
Yue Dong
8
5
0
22 Dec 2023
Hijacking Large Language Models via Adversarial In-Context Learning
Yao Qiang
Xiangyu Zhou
Dongxiao Zhu
30
32
0
16 Nov 2023
Instruction Tuning with Human Curriculum
Bruce W. Lee
Hyunsoo Cho
Kang Min Yoo
30
3
0
14 Oct 2023
We're Afraid Language Models Aren't Modeling Ambiguity
Alisa Liu
Zhaofeng Wu
Julian Michael
Alane Suhr
Peter West
Alexander Koller
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
63
87
0
27 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1