Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.10328
Cited By
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
18 June 2021
Irene Solaiman
Christy Dennison
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets"
48 / 148 papers shown
Title
Overwriting Pretrained Bias with Finetuning Data
Angelina Wang
Olga Russakovsky
21
29
0
10 Mar 2023
Pretraining Language Models with Human Preferences
Tomasz Korbak
Kejian Shi
Angelica Chen
Rasika Bhalerao
C. L. Buckley
Jason Phang
Sam Bowman
Ethan Perez
ALM
SyDa
36
206
0
16 Feb 2023
Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
24
68
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
45
158
0
15 Feb 2023
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Shrimai Prabhumoye
M. Patwary
M. Shoeybi
Bryan Catanzaro
LM&MA
30
19
0
14 Feb 2023
The Gradient of Generative AI Release: Methods and Considerations
Irene Solaiman
22
98
0
05 Feb 2023
Using In-Context Learning to Improve Dialogue Safety
Nicholas Meade
Spandana Gella
Devamanyu Hazarika
Prakhar Gupta
Di Jin
Siva Reddy
Yang Liu
Dilek Z. Hakkani-Tür
25
38
0
02 Feb 2023
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Terry Yue Zhuo
Yujin Huang
Chunyang Chen
Zhenchang Xing
SILM
28
102
0
30 Jan 2023
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLM
LRM
27
181
0
15 Dec 2022
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
62
1,476
0
15 Dec 2022
Speaking Multiple Languages Affects the Moral Bias of Language Models
Katharina Hämmerl
Bjorn Deiseroth
P. Schramowski
Jindrich Libovický
Constantin Rothkopf
Alexander M. Fraser
Kristian Kersting
29
31
0
14 Nov 2022
ADEPT: A DEbiasing PrompT Framework
Ke Yang
Charles Yu
Yi Ren Fung
Manling Li
Heng Ji
19
23
0
10 Nov 2022
Robosourcing Educational Resources -- Leveraging Large Language Models for Learnersourcing
Paul Denny
Sami Sarsa
Arto Hellas
Juho Leinonen
AI4Ed
6
35
0
09 Nov 2022
Mitigating Covertly Unsafe Text within Natural Language Systems
Alex Mei
Anisha Kabir
Sharon Levy
Melanie Subbiah
Emily Allaway
J. Judge
D. Patton
Bruce Bimber
Kathleen McKeown
William Yang Wang
50
13
0
17 Oct 2022
Prompting GPT-3 To Be Reliable
Chenglei Si
Zhe Gan
Zhengyuan Yang
Shuohang Wang
Jianfeng Wang
Jordan L. Boyd-Graber
Lijuan Wang
KELM
LRM
44
278
0
17 Oct 2022
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly
Yi Ren Fung
Tuhin Chakraborty
Hao Guo
Owen Rambow
Smaranda Muresan
Heng Ji
21
39
0
16 Oct 2022
Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values
Yejin Bang
Tiezheng Yu
Andrea Madotto
Zhaojiang Lin
Mona T. Diab
Pascale Fung
19
13
0
14 Oct 2022
Back to the Future: On Potential Histories in NLP
Zeerak Talat
Anne Lauscher
AI4TS
30
4
0
12 Oct 2022
On the Impossible Safety of Large AI Models
El-Mahdi El-Mhamdi
Sadegh Farhadkhani
R. Guerraoui
Nirupam Gupta
L. Hoang
Rafael Pinot
Sébastien Rouault
John Stephan
30
31
0
30 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
444
0
23 Aug 2022
Deception for Cyber Defence: Challenges and Opportunities
David Liebowitz
Surya Nepal
Kristen Moore
Cody James Christopher
S. Kanhere
David D. Nguyen
Roelien C. Timmer
Michael Longland
Keerth Rathakumar
34
10
0
15 Aug 2022
Few-shot Adaptation Works with UnpredicTable Data
Jun Shern Chan
Michael Pieler
Jonathan Jao
Jérémy Scheurer
Ethan Perez
28
5
0
01 Aug 2022
A Hazard Analysis Framework for Code Synthesis Large Language Models
Heidy Khlaaf
Pamela Mishkin
Joshua Achiam
Gretchen Krueger
Miles Brundage
ELM
17
28
0
25 Jul 2022
Democratizing Ethical Assessment of Natural Language Generation Models
A. Rasekh
Ian W. Eisenberg
ELM
21
1
0
30 Jun 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
25
49
0
16 Jun 2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
Conrad Borchers
Dalia Sara Gala
Ben Gilburt
Eduard Oravkin
Wilfried Bounsi
Yuki M. Asano
Hannah Rose Kirk
AI4CE
19
27
0
23 May 2022
Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy
Allison Lahnala
Charles F Welch
Béla Neuendorf
Lucie Flek
57
13
0
15 May 2022
AfriWOZ: Corpus for Exploiting Cross-Lingual Transferability for Generation of Dialogues in Low-Resource, African Languages
Tosin P. Adewumi
Mofetoluwa Adeyemi
Aremu Anuoluwapo
Bukola Peters
Happy Buzaaba
...
Phylis Ngigi
Orevaoghene Ahia
Ruqayya Nasir
F. Liwicki
Marcus Liwicki
15
1
0
17 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
72
2,318
0
12 Apr 2022
Probing Pre-Trained Language Models for Cross-Cultural Differences in Values
Arnav Arora
Lucie-Aimée Kaffee
Isabelle Augenstein
VLM
25
123
0
25 Mar 2022
Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal
Umang Gupta
Jwala Dhamala
Varun Kumar
Apurv Verma
Yada Pruksachatkun
Satyapriya Krishna
Rahul Gupta
Kai-Wei Chang
Greg Ver Steeg
Aram Galstyan
11
50
0
23 Mar 2022
Challenges and Strategies in Cross-Cultural NLP
Daniel Hershcovich
Stella Frank
Heather Lent
Miryam de Lhoneux
Mostafa Abdou
...
Ruixiang Cui
Constanza Fierro
Katerina Margatina
Phillip Rust
Anders Søgaard
43
163
0
18 Mar 2022
The Ghost in the Machine has an American accent: value conflict in GPT-3
Rebecca Lynn Johnson
Giada Pistilli
Natalia Menédez-González
Leslye Denisse Dias Duran
Enrico Panai
Julija Kalpokienė
D. Bertulfo
11
83
0
15 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,915
0
04 Mar 2022
Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
Boxin Wang
Wei Ping
Chaowei Xiao
P. Xu
M. Patwary
M. Shoeybi
Bo-wen Li
Anima Anandkumar
Bryan Catanzaro
14
64
0
08 Feb 2022
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
13
609
0
07 Feb 2022
Text and Code Embeddings by Contrastive Pre-Training
Arvind Neelakantan
Tao Xu
Raul Puri
Alec Radford
Jesse Michael Han
...
Tabarak Khan
Toki Sherbakov
Joanne Jang
Peter Welinder
Lilian Weng
SSL
AI4TS
218
421
0
24 Jan 2022
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
13
1,557
0
20 Jan 2022
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
27
975
0
08 Dec 2021
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
11
714
0
01 Dec 2021
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
V. Aribandi
Yi Tay
Tal Schuster
J. Rao
H. Zheng
...
Jianmo Ni
Jai Gupta
Kai Hui
Sebastian Ruder
Donald Metzler
MoE
18
213
0
22 Nov 2021
SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets
Ann Yuan
Daphne Ippolito
Vitaly Nikolaev
Chris Callison-Burch
Andy Coenen
Sebastian Gehrmann
SyDa
106
20
0
11 Nov 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
34
1,719
0
08 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
33
3,567
0
03 Sep 2021
Mitigating harm in language models with conditional-likelihood filtration
Helen Ngo
Cooper D. Raterink
J. Araújo
Ivan Zhang
Carol Chen
Adrien Morisot
Nick Frosst
24
41
0
04 Aug 2021
Your fairness may vary: Pretrained language model fairness in toxic text classification
Ioana Baldini
Dennis L. Wei
K. Ramamurthy
Mikhail Yurochkin
Moninder Singh
24
53
0
03 Aug 2021
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling
Emily Dinan
Gavin Abercrombie
A. S. Bergman
Shannon L. Spruit
Dirk Hovy
Y-Lan Boureau
Verena Rieser
34
105
0
07 Jul 2021
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems
E. Razumovskaia
Goran Glavavs
Olga Majewska
E. Ponti
Anna Korhonen
Ivan Vulić
18
32
0
17 Apr 2021
Previous
1
2
3