ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.11462
  4. Cited By
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
ArXivPDFHTML

Papers citing "RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"

50 / 201 papers shown
Title
Long-form analogies generated by chatGPT lack human-like
  psycholinguistic properties
Long-form analogies generated by chatGPT lack human-like psycholinguistic properties
S. M. Seals
V. Shalin
16
10
0
07 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
30
303
0
02 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute
  Controlled Generation
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan
Rishabh Garg
Kahini Wadhawan
S. Mehta
21
5
0
01 Jun 2023
Thought Cloning: Learning to Think while Acting by Imitating Human
  Thinking
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Shengran Hu
Jeff Clune
LM&Ro
OffRL
LRM
AI4CE
35
27
0
01 Jun 2023
An Invariant Learning Characterization of Controlled Text Generation
An Invariant Learning Characterization of Controlled Text Generation
Carolina Zheng
Claudia Shi
Keyon Vafa
Amir Feder
David M. Blei
OOD
24
8
0
31 May 2023
The False Promise of Imitating Proprietary LLMs
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande
Eric Wallace
Charles Burton Snell
Xinyang Geng
Hao Liu
Pieter Abbeel
Sergey Levine
Dawn Song
ALM
27
196
0
25 May 2023
BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of
  Implied Social Biases
BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases
Yiming Zhang
Sravani Nanduri
Liwei Jiang
Tongshuang Wu
Maarten Sap
39
7
0
23 May 2023
This Prompt is Measuring <MASK>: Evaluating Bias Evaluation in Language
  Models
This Prompt is Measuring <MASK>: Evaluating Bias Evaluation in Language Models
Seraphina Goldfarb-Tarrant
Eddie L. Ungless
Esma Balkir
Su Lin Blodgett
29
9
0
22 May 2023
Data Redaction from Conditional Generative Models
Data Redaction from Conditional Generative Models
Zhifeng Kong
Kamalika Chaudhuri
KELM
16
7
0
18 May 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMe
MoE
31
174
0
17 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
74
1,142
0
17 May 2023
Language Model Tokenizers Introduce Unfairness Between Languages
Language Model Tokenizers Introduce Unfairness Between Languages
Aleksandar Petrov
Emanuele La Malfa
Philip H. S. Torr
Adel Bibi
16
96
0
17 May 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
16
401
0
13 Apr 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
  and Ethical Behavior in the MACHIAVELLI Benchmark
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
24
126
0
06 Apr 2023
Large language models effectively leverage document-level context for
  literary translation, but critical errors persist
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
31
81
0
06 Apr 2023
Assessing Language Model Deployment with Risk Cards
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
20
42
0
31 Mar 2023
Training Language Models with Language Feedback at Scale
Training Language Models with Language Feedback at Scale
Jérémy Scheurer
Jon Ander Campos
Tomasz Korbak
Jun Shern Chan
Angelica Chen
Kyunghyun Cho
Ethan Perez
ALM
36
101
0
28 Mar 2023
Koala: An Index for Quantifying Overlaps with Pre-training Corpora
Koala: An Index for Quantifying Overlaps with Pre-training Corpora
Thuy-Trang Vu
Xuanli He
Gholamreza Haffari
Ehsan Shareghi
CLL
21
12
0
26 Mar 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated
  Applications with Indirect Prompt Injection
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
41
431
0
23 Feb 2023
Bounding the Capabilities of Large Language Models in Open Text
  Generation with Prompt Constraints
Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints
Albert Lu
Hongxin Zhang
Yanzhe Zhang
Xuezhi Wang
Diyi Yang
LRM
24
28
0
17 Feb 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
42
194
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
36
158
0
15 Feb 2023
Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL
  Robustness
Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness
Shuaichen Chang
J. Wang
Mingwen Dong
Lin Pan
Henghui Zhu
...
William Yang Wang
Zhiguo Wang
Vittorio Castelli
Patrick K. L. Ng
Bing Xiang
OOD
33
34
0
21 Jan 2023
Leveraging Large Language Models to Power Chatbots for Collecting User
  Self-Reported Data
Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data
Jing Wei
Sungdong Kim
Hyunhoon Jung
Young-Ho Kim
25
81
0
14 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
90
34
0
01 Jan 2023
MAUVE Scores for Generative Models: Theory and Practice
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
31
21
0
30 Dec 2022
Trustworthy Social Bias Measurement
Trustworthy Social Bias Measurement
Rishi Bommasani
Percy Liang
27
10
0
20 Dec 2022
Evaluating Human-Language Model Interaction
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
46
99
0
19 Dec 2022
I2D2: Inductive Knowledge Distillation with NeuroLogic and
  Self-Imitation
I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation
Chandra Bhagavatula
Jena D. Hwang
Doug Downey
Ronan Le Bras
Ximing Lu
Lianhui Qin
Keisuke Sakaguchi
Swabha Swayamdipta
Peter West
Yejin Choi
21
34
0
19 Dec 2022
DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text
  Generation
DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text Generation
Yuxi Feng
Xiaoyuan Yi
Xiting Wang
L. Lakshmanan
Xing Xie
DiffM
27
5
0
16 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
45
424
0
08 Dec 2022
Improving Iterative Text Revision by Learning Where to Edit from Other
  Revision Tasks
Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks
Zae Myung Kim
Wanyu Du
Vipul Raheja
Dhruv Kumar
Dongyeop Kang
15
16
0
02 Dec 2022
Understanding BLOOM: An empirical study on diverse NLP tasks
Understanding BLOOM: An empirical study on diverse NLP tasks
Parag Dakle
Sai Krishna Rallabandi
Preethi Raghavan
AI4CE
31
3
0
27 Nov 2022
Best-$k$ Search Algorithm for Neural Text Generation
Best-kkk Search Algorithm for Neural Text Generation
Jiacheng Xu
Caiming Xiong
Silvio Savarese
Yingbo Zhou
30
5
0
22 Nov 2022
Validating Large Language Models with ReLM
Validating Large Language Models with ReLM
Michael Kuchnik
Virginia Smith
George Amvrosiadis
21
27
0
21 Nov 2022
Ignore Previous Prompt: Attack Techniques For Language Models
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez
Ian Ribeiro
SILM
28
396
0
17 Nov 2022
The CRINGE Loss: Learning what language not to model
The CRINGE Loss: Learning what language not to model
Leonard Adolphs
Tianyu Gao
Jing Xu
Kurt Shuster
Sainbayar Sukhbaatar
Jason Weston
MU
23
34
0
10 Nov 2022
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in
  Diffusion Models
Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models
P. Schramowski
Manuel Brack
Bjorn Deiseroth
Kristian Kersting
37
269
0
09 Nov 2022
Tuning Language Models as Training Data Generators for
  Augmentation-Enhanced Few-Shot Learning
Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning
Yu Meng
Martin Michalski
Jiaxin Huang
Yu Zhang
Tarek F. Abdelzaher
Jiawei Han
VLM
43
46
0
06 Nov 2022
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for
  Text Generation and Modular Control
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Xiaochuang Han
Sachin Kumar
Yulia Tsvetkov
32
79
0
31 Oct 2022
Nearest Neighbor Language Models for Stylistic Controllable Generation
Nearest Neighbor Language Models for Stylistic Controllable Generation
Severino Trotta
Lucie Flek
Charles F Welch
16
4
0
27 Oct 2022
Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black
  Magic?
Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic?
Jean-Baptiste Döderlein
M. Acher
D. Khelladi
B. Combemale
34
33
0
26 Oct 2022
SentBS: Sentence-level Beam Search for Controllable Summarization
SentBS: Sentence-level Beam Search for Controllable Summarization
Chenhui Shen
Liying Cheng
Lidong Bing
Yang You
Luo Si
33
11
0
26 Oct 2022
NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer
  Data Augmentation
NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation
Phillip Howard
Gadi Singer
Vasudev Lal
Yejin Choi
Swabha Swayamdipta
CML
52
25
0
22 Oct 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
60
2,987
0
20 Oct 2022
Attribution and Obfuscation of Neural Text Authorship: A Data Mining
  Perspective
Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective
Adaku Uchendu
Thai Le
Dongwon Lee
DeLMO
24
40
0
19 Oct 2022
Deep Bidirectional Language-Knowledge Graph Pretraining
Deep Bidirectional Language-Knowledge Graph Pretraining
Michihiro Yasunaga
Antoine Bosselut
Hongyu Ren
Xikun Zhang
Christopher D. Manning
Percy Liang
J. Leskovec
20
193
0
17 Oct 2022
Keep Me Updated! Memory Management in Long-term Conversations
Keep Me Updated! Memory Management in Long-term Conversations
Sanghwan Bae
Donghyun Kwak
Soyoung Kang
Min Young Lee
Sungdong Kim
Yuin Jeong
Hyeri Kim
Sang-Woo Lee
W. Park
Nako Sung
38
46
0
17 Oct 2022
Downstream Datasets Make Surprisingly Good Pretraining Corpora
Downstream Datasets Make Surprisingly Good Pretraining Corpora
Kundan Krishna
Saurabh Garg
Jeffrey P. Bigham
Zachary Chase Lipton
38
30
0
28 Sep 2022
Will It Blend? Mixing Training Paradigms & Prompting for Argument
  Quality Prediction
Will It Blend? Mixing Training Paradigms & Prompting for Argument Quality Prediction
Michiel van der Meer
Myrthe Reuver
Urja Khurana
Lea Krause
Selene Báez Santamaría
40
14
0
19 Sep 2022
Previous
12345
Next