ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.10328
  4. Cited By
Process for Adapting Language Models to Society (PALMS) with
  Values-Targeted Datasets

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

18 June 2021
Irene Solaiman
Christy Dennison
ArXivPDFHTML

Papers citing "Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets"

50 / 148 papers shown
Title
Tackling Bias in Pre-trained Language Models: Current Trends and
  Under-represented Societies
Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies
Vithya Yogarajan
Gillian Dobbie
Te Taka Keegan
R. Neuwirth
ALM
43
11
0
03 Dec 2023
Large Language Models in Law: A Survey
Large Language Models in Law: A Survey
Jinqi Lai
Wensheng Gan
Jiayang Wu
Zhenlian Qi
Philip S. Yu
ELM
AILaw
26
70
0
26 Nov 2023
Unmasking and Improving Data Credibility: A Study with Datasets for
  Training Harmless Language Models
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
Zhaowei Zhu
Jialu Wang
Hao Cheng
Yang Liu
21
14
0
19 Nov 2023
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New
  LLM-powered Applications
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications
Bhaktipriya Radharapu
Kevin Robinson
Lora Aroyo
Preethi Lahoti
18
37
0
14 Nov 2023
Functionality learning through specification instructions
Functionality learning through specification instructions
Pedro Henrique Luz de Araujo
Benjamin Roth
ELM
33
0
0
14 Nov 2023
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can
  Fool Large Language Models Easily
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
Peng Ding
Jun Kuang
Dan Ma
Xuezhi Cao
Yunsen Xian
Jiajun Chen
Shujian Huang
AAML
19
95
0
14 Nov 2023
All Should Be Equal in the Eyes of Language Models: Counterfactually
  Aware Fair Text Generation
All Should Be Equal in the Eyes of Language Models: Counterfactually Aware Fair Text Generation
Pragyan Banerjee
Abhinav Java
Surgan Jandial
Simra Shahid
Shaz Furniturewala
Balaji Krishnamurthy
S. Bhatia
33
3
0
09 Nov 2023
Large Human Language Models: A Need and the Challenges
Large Human Language Models: A Need and the Challenges
Nikita Soni
H. A. Schwartz
João Sedoc
Niranjan Balasubramanian
ALM
AI4CE
22
10
0
09 Nov 2023
GRASP: A Disagreement Analysis Framework to Assess Group Associations in
  Perspectives
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives
Vinodkumar Prabhakaran
Christopher Homan
Lora Aroyo
Aida Mostafazadeh Davani
Alicia Parrish
Alex S. Taylor
Mark Díaz
Ding Wang
Greg Serapio-García
34
9
0
09 Nov 2023
SoK: Memorization in General-Purpose Large Language Models
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David E. Evans
Shruti Tople
Robert West
KELM
LLMAG
16
20
0
24 Oct 2023
Identifying and Adapting Transformer-Components Responsible for Gender
  Bias in an English Language Model
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
Abhijith Chintam
Rahel Beloch
Willem H. Zuidema
Michael Hanna
Oskar van der Wal
23
16
0
19 Oct 2023
Attack Prompt Generation for Red Teaming and Defending Large Language
  Models
Attack Prompt Generation for Red Teaming and Defending Large Language Models
Boyi Deng
Wenjie Wang
Fuli Feng
Yang Deng
Qifan Wang
Xiangnan He
AAML
25
48
0
19 Oct 2023
Group Preference Optimization: Few-Shot Alignment of Large Language
  Models
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao
John Dang
Aditya Grover
25
29
0
17 Oct 2023
CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations
CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations
Myra Cheng
Tiziano Piccardi
Diyi Yang
LLMAG
16
67
0
17 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large
  Language Models for Subjective Human Preferences and Values
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
19
41
0
11 Oct 2023
Evaluating and Improving Value Judgments in AI: A Scenario-Based Study
  on Large Language Models' Depiction of Social Conventions
Evaluating and Improving Value Judgments in AI: A Scenario-Based Study on Large Language Models' Depiction of Social Conventions
Jaeyoun You
Bongwon Suh
34
0
0
04 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
115
300
0
19 Sep 2023
Learning Unbiased News Article Representations: A Knowledge-Infused
  Approach
Learning Unbiased News Article Representations: A Knowledge-Infused Approach
Sadia Kamal
Jimmy Hartford
Jeremy Willis
A. Bagavathi
11
1
0
12 Sep 2023
Bias and Fairness in Large Language Models: A Survey
Bias and Fairness in Large Language Models: A Survey
Isabel O. Gallegos
Ryan A. Rossi
Joe Barrow
Md Mehrab Tanjim
Sungchul Kim
Franck Dernoncourt
Tong Yu
Ruiyi Zhang
Nesreen Ahmed
AILaw
19
485
0
02 Sep 2023
Let the Models Respond: Interpreting Language Model Detoxification
  Through the Lens of Prompt Dependence
Let the Models Respond: Interpreting Language Model Detoxification Through the Lens of Prompt Dependence
Daniel Scalena
Gabriele Sarti
Malvina Nissim
Elisabetta Fersini
16
0
0
01 Sep 2023
Identifying and Mitigating the Security Risks of Generative AI
Identifying and Mitigating the Security Risks of Generative AI
Clark W. Barrett
Bradley L Boyd
Ellie Burzstein
Nicholas Carlini
Brad Chen
...
Zulfikar Ramzan
Khawaja Shams
D. Song
Ankur Taly
Diyi Yang
SILM
29
91
0
28 Aug 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment
  Goals for Big Models
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Jindong Wang
Xing Xie
ALM
19
42
0
23 Aug 2023
Red-Teaming Large Language Models using Chain of Utterances for
  Safety-Alignment
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Rishabh Bhardwaj
Soujanya Poria
ELM
17
127
0
18 Aug 2023
CMD: a framework for Context-aware Model self-Detoxification
CMD: a framework for Context-aware Model self-Detoxification
Zecheng Tang
Keyan Zhou
Juntao Li
Yuyang Ding
Pinzheng Wang
Bowen Yan
Minzhang
MU
23
5
0
16 Aug 2023
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Pinjia He
Shuming Shi
Zhaopeng Tu
SILM
67
231
0
12 Aug 2023
Self-Alignment with Instruction Backtranslation
Self-Alignment with Instruction Backtranslation
Xian Li
Ping Yu
Chunting Zhou
Timo Schick
Omer Levy
Luke Zettlemoyer
Jason Weston
M. Lewis
SyDa
24
123
0
11 Aug 2023
Neural Conversation Models and How to Rein Them in: A Survey of Failures
  and Fixes
Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes
Fabian Galetzka
Anne Beyer
David Schlangen
AI4CE
24
1
0
11 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
44
470
0
27 Jul 2023
Evaluating the Moral Beliefs Encoded in LLMs
Evaluating the Moral Beliefs Encoded in LLMs
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
25
116
0
26 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
75
837
0
05 Jul 2023
Scaling Laws Do Not Scale
Scaling Laws Do Not Scale
Fernando Diaz
Michael A. Madaio
23
8
0
05 Jul 2023
Towards Measuring the Representation of Subjective Global Opinions in
  Language Models
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
33
205
0
28 Jun 2023
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
  Collaboration for Large Language Models
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models
Yufei Huang
Deyi Xiong
ALM
34
17
0
28 Jun 2023
Intersectionality in Conversational AI Safety: How Bayesian Multilevel
  Models Help Understand Diverse Perceptions of Safety
Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety
Christopher Homan
Greg Serapio-García
Lora Aroyo
Mark Díaz
Alicia Parrish
Vinodkumar Prabhakaran
Alex S. Taylor
Ding Wang
22
9
0
20 Jun 2023
DICES Dataset: Diversity in Conversational AI Evaluation for Safety
DICES Dataset: Diversity in Conversational AI Evaluation for Safety
Lora Aroyo
Alex S. Taylor
Mark Díaz
Christopher Homan
Alicia Parrish
Greg Serapio-García
Vinodkumar Prabhakaran
Ding Wang
24
33
0
20 Jun 2023
Inverse Scaling: When Bigger Isn't Better
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
25
126
0
15 Jun 2023
Evaluating the Social Impact of Generative AI Systems in Systems and
  Society
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Irene Solaiman
Zeerak Talat
William Agnew
Lama Ahmad
Dylan K. Baker
...
Marie-Therese Png
Shubham Singh
A. Strait
Lukas Struppek
Arjun Subramonian
ELM
EGVM
31
104
0
09 Jun 2023
I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box
  Generative Language Models
I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models
Max Reuter
William B. Schulze
26
4
0
06 Jun 2023
Exposing Bias in Online Communities through Large-Scale Language Models
Exposing Bias in Online Communities through Large-Scale Language Models
Celine Wald
Lukas Pfahler
11
6
0
04 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
21
23
0
01 Jun 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
  Language Model Application
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
30
28
0
28 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
  Responses Created Through Human-Machine Collaboration
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
M. Cha
...
Eun-Ju Lee
Yong Lim
Alice H. Oh
San-hee Park
Jung-Woo Ha
36
16
0
28 May 2023
DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules
DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules
Yanchen Liu
William B. Held
Diyi Yang
43
10
0
22 May 2023
ReSeTOX: Re-learning attention weights for toxicity mitigation in
  machine translation
ReSeTOX: Re-learning attention weights for toxicity mitigation in machine translation
Javier García Gilabert
Carlos Escolano
Marta R. Costa-jussá
CLL
MU
19
2
0
19 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with
  Minimal Human Supervision
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Yikang Shen
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
25
313
0
04 May 2023
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to
  Guardrail Models for Virtual Assistants
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
A. Sun
Varun Nair
Elliot Schumacher
Anitha Kannan
27
3
0
27 Apr 2023
Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to
  Fine-Tune and Hard to Detect with other LLMs
Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs
Da Silva Gameiro Henrique
Andrei Kucharavy
R. Guerraoui
DeLMO
14
7
0
18 Apr 2023
Can Large Language Models Transform Computational Social Science?
Can Large Language Models Transform Computational Social Science?
Caleb Ziems
William B. Held
Omar Shaikh
Jiaao Chen
Zhehao Zhang
Diyi Yang
LLMAG
28
286
0
12 Apr 2023
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language
  Models
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Vithursan Thangarasa
Abhay Gupta
William Marshall
Tianda Li
Kevin Leong
D. DeCoste
Sean Lie
Shreyas Saxena
MoE
AI4CE
16
18
0
18 Mar 2023
Exploring the Relevance of Data Privacy-Enhancing Technologies for AI
  Governance Use Cases
Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases
Emma Bluemke
Tantum Collins
Ben Garfinkel
Andrew Trask
9
10
0
15 Mar 2023
Previous
123
Next