Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.06924
Cited By
Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack
12 December 2023
Yu Fu
Yufei Li
Wen Xiao
Cong Liu
Yue Dong
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack"
3 / 3 papers shown
Title
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
144
139
0
16 Oct 2023
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
90
124
0
01 May 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1