Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.07700
Cited By
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
14 October 2022
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey"
29 / 29 papers shown
Title
SAGE
\texttt{SAGE}
SAGE
: A Generic Framework for LLM Safety Evaluation
Madhur Jindal
Hari Shrawgi
Parag Agrawal
Sandipan Dandapat
ELM
47
0
0
28 Apr 2025
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Berk Atil
Vipul Gupta
Sarkar Snigdha Sarathi Das
R. Passonneau
59
0
0
07 Feb 2025
Post-hoc Study of Climate Microtargeting on Social Media Ads with LLMs: Thematic Insights and Fairness Evaluation
Tunazzina Islam
Dan Goldwasser
28
1
0
07 Oct 2024
Measuring Human Contribution in AI-Assisted Content Generation
Yueqi Xie
Tao Qi
Jingwei Yi
Ryan Whalen
Junming Huang
Qian Ding
Yu Xie
Xing Xie
Fangzhao Wu
Fangzhao Wu
22
1
0
27 Aug 2024
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Orevaoghene Ahia
Shuyue Stella Li
Vidhisha Balachandran
Sunayana Sitaram
Yulia Tsvetkov
46
4
0
22 Jun 2024
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Hwiyeol Jo
Taiwoo Park
Nayoung Choi
Changbong Kim
Ohjoon Kwon
...
Kyoungho Shin
Sun Suk Lim
Kyungmi Kim
Jihye Lee
Sun Kim
50
0
0
05 Apr 2024
A Survey on Fairness in Large Language Models
Yingji Li
Mengnan Du
Rui Song
Xin Wang
Ying Wang
ALM
14
59
0
20 Aug 2023
Correcting Diverse Factual Errors in Abstractive Summarization via Post-Editing and Language Model Infilling
Vidhisha Balachandran
Hannaneh Hajishirzi
William W. Cohen
Yulia Tsvetkov
HILM
KELM
64
45
0
22 Oct 2022
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Joel Jang
Dongkeun Yoon
Sohee Yang
Sungmin Cha
Moontae Lee
Lajanugen Logeswaran
Minjoon Seo
KELM
PILM
MU
139
110
0
04 Oct 2022
Data Feedback Loops: Model-driven Amplification of Dataset Biases
Rohan Taori
Tatsunori B. Hashimoto
61
42
0
08 Sep 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
121
94
0
01 Jul 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
A Systematic Study of Bias Amplification
Melissa Hall
L. V. D. van der Maaten
Laura Gustafson
Maxwell Jones
Aaron B. Adcock
80
69
0
27 Jan 2022
Relational Memory Augmented Language Models
Qi Liu
Dani Yogatama
Phil Blunsom
KELM
RALM
61
27
0
24 Jan 2022
Fast Model Editing at Scale
E. Mitchell
Charles Lin
Antoine Bosselut
Chelsea Finn
Christopher D. Manning
KELM
217
254
0
21 Oct 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Assisting the Human Fact-Checkers: Detecting All Previously Fact-Checked Claims in a Document
Shaden Shaar
Nikola Georgiev
Firoj Alam
Giovanni Da San Martino
Aisha Mohamed
Preslav Nakov
HILM
60
20
0
14 Sep 2021
Efficient Nearest Neighbor Language Models
Junxian He
Graham Neubig
Taylor Berg-Kirkpatrick
RALM
182
103
0
09 Sep 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
234
447
0
14 Jul 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
210
265
0
27 Apr 2021
ToxCCIn: Toxic Content Classification with Interpretability
Tong Xiang
Sean MacAvaney
Eugene Yang
Nazli Goharian
73
14
0
01 Mar 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
248
374
0
28 Feb 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
261
1,386
0
14 Dec 2020
Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation
Yuning Mao
Xiang Ren
Heng Ji
Jiawei Han
HILM
113
38
0
24 Oct 2020
Factual Error Correction for Abstractive Summarization Models
Mengyao Cao
Yue Dong
Jiapeng Wu
Jackie C.K. Cheung
HILM
KELM
167
139
0
17 Oct 2020
Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI
Alon Jacovi
Ana Marasović
Tim Miller
Yoav Goldberg
236
417
0
15 Oct 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
228
306
0
21 Aug 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,003
0
20 Apr 2018
1