Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.10328
Cited By
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
18 June 2021
Irene Solaiman
Christy Dennison
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets"
50 / 148 papers shown
Title
FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness
Chandan Kumar Sah
Xiaoli Lian
Tony Xu
Li Zhang
26
0
0
10 Apr 2025
Following the Whispers of Values: Unraveling Neural Mechanisms Behind Value-Oriented Behaviors in LLMs
Ling Hu
Yuemei Xu
Xiaoyang Gu
Letao Han
28
0
0
07 Apr 2025
Alignment for Efficient Tool Calling of Large Language Models
Hongshen Xu
Zihan Wang
Zichen Zhu
Lei Pan
Xingyu Chen
L. Chen
Kai Yu
47
0
0
09 Mar 2025
Societal Alignment Frameworks Can Improve LLM Alignment
Karolina Stañczak
Nicholas Meade
Mehar Bhatia
Hattie Zhou
Konstantin Böttinger
...
Timothy P. Lillicrap
Ana Marasović
Sylvie Delacroix
Gillian K. Hadfield
Siva Reddy
125
0
0
27 Feb 2025
A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety
Rakeen Rouf
Trupti Bavalatti
Osama Ahmed
Dhaval Potdar
Faraz Jawed
EGVM
58
1
0
23 Feb 2025
Practical Principles for AI Cost and Compute Accounting
Stephen Casper
Luke Bailey
Tim Schreier
41
0
0
21 Feb 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun-Xiong Xia
Tianyi Wu
Zhiwei Xue
Y. Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
131
13
0
30 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
48
0
0
07 Jan 2025
The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare
Souren Pashangpour
Goldie Nejat
LM&MA
42
7
0
05 Nov 2024
Large Language Models Still Exhibit Bias in Long Text
Wonje Jeung
Dongjae Jeon
Ashkan Yousefpour
Jonghyun Choi
ALM
29
2
0
23 Oct 2024
ComPO: Community Preferences for Language Model Personalization
Sachin Kumar
Chan Young Park
Yulia Tsvetkov
Noah A. Smith
Hannaneh Hajishirzi
29
5
0
21 Oct 2024
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions
Yu-Shin Huang
Peter Just
Krishna Narayanan
Chao Tian
34
1
0
06 Oct 2024
FlipAttack: Jailbreak LLMs via Flipping
Yue Liu
Xiaoxin He
Miao Xiong
Jinlan Fu
Shumin Deng
Bryan Hooi
AAML
34
12
0
02 Oct 2024
Downstream bias mitigation is all you need
Arkadeep Baksi
Rahul Singh
Tarun Joshi
AI4CE
22
0
0
01 Aug 2024
Virtue Ethics For Ethically Tunable Robotic Assistants
Rajitha Ramanayake
Vivek Nallur
21
0
0
23 Jul 2024
LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
Shi Lin
Rongchang Li
Xun Wang
Changting Lin
Xun Wang
Wenpeng Xing
Meng Han
Meng Han
55
3
0
23 Jul 2024
Bringing AI Participation Down to Scale: A Comment on Open AIs Democratic Inputs to AI Project
David Moats
Chandrima Ganguly
VLM
38
0
0
16 Jul 2024
The Sociolinguistic Foundations of Language Modeling
Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
A. Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter
33
7
0
12 Jul 2024
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan
Wenxiang Jiao
Wenxuan Wang
Jen-tse Huang
Jiahao Xu
Tian Liang
Pinjia He
Zhaopeng Tu
43
19
0
12 Jul 2024
AI Companions Reduce Loneliness
Julian De Freitas
A. K. Uğuralp
Zeliha Uğuralp
Puntoni Stefano
AI4MH
16
11
0
09 Jul 2024
Fairness and Bias in Multimodal AI: A Survey
Tosin P. Adewumi
Lama Alkhaled
Namrata Gurung
G. V. Boven
Irene Pagliai
48
9
0
27 Jun 2024
Few-shot Personalization of LLMs with Mis-aligned Responses
Jaehyung Kim
Yiming Yang
44
7
0
26 Jun 2024
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts
Caroline Brun
Vassilina Nikoulina
36
1
0
25 Jun 2024
FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering
Tianchi Cai
Zhiwen Tan
Xierui Song
Tao Sun
Jiyan Jiang
Yunqi Xu
Yinger Zhang
Jinjie Gu
27
5
0
19 Jun 2024
Who's asking? User personas and the mechanics of latent misalignment
Asma Ghandeharioun
Ann Yuan
Marius Guerard
Emily Reif
Michael A. Lepori
Lucas Dixon
LLMSV
41
7
0
17 Jun 2024
garak: A Framework for Security Probing Large Language Models
Leon Derczynski
Erick Galinkin
Jeffrey Martin
Subho Majumdar
Nanna Inie
AAML
ELM
38
16
0
16 Jun 2024
Collective Constitutional AI: Aligning a Language Model with Public Input
Saffron Huang
Divya Siddarth
Liane Lovitt
Thomas I. Liao
Esin Durmus
Alex Tamkin
Deep Ganguli
ELM
59
70
0
12 Jun 2024
The Life Cycle of Large Language Models: A Review of Biases in Education
Jinsook Lee
Yann Hicke
Renzhe Yu
Christopher A. Brooks
René F. Kizilcec
AI4Ed
34
1
0
03 Jun 2024
Harnessing Business and Media Insights with Large Language Models
Yujia Bao
Ankit Parag Shah
Neeru Narang
Jonathan Rivers
Rajeev Maksey
...
Gyuhak Kim
Dengpan Yin
Don Hejna
Mo Nomeli
Wei Wei
AIFin
38
2
0
02 Jun 2024
Low-rank finetuning for LLMs: A fairness perspective
Saswat Das
Marco Romanelli
Cuong Tran
Zarreen Reza
B. Kailkhura
Ferdinando Fioretto
40
1
0
28 May 2024
The Mosaic Memory of Large Language Models
Igor Shilov
Matthieu Meeus
Yves-Alexandre de Montjoye
39
3
0
24 May 2024
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming
Jiaxu Liu
Xiangyu Yin
Sihao Wu
Jianhong Wang
Meng Fang
Xinping Yi
Xiaowei Huang
32
4
0
21 May 2024
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents
Seth Lazar
SILM
29
0
0
10 Apr 2024
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Hwiyeol Jo
Taiwoo Park
Nayoung Choi
Changbong Kim
Ohjoon Kwon
...
Kyoungho Shin
Sun Suk Lim
Kyungmi Kim
Jihye Lee
Sun Kim
60
0
0
05 Apr 2024
Measuring Political Bias in Large Language Models: What Is Said and How It Is Said
Yejin Bang
Delong Chen
Nayeon Lee
Pascale Fung
29
25
0
27 Mar 2024
What are human values, and how do we align AI to them?
Oliver Klingefjord
Ryan Lowe
Joe Edelman
36
19
0
27 Mar 2024
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Hongshen Xu
Zichen Zhu
Situo Zhang
Da Ma
Shuai Fan
Lu Chen
Kai Yu
HILM
29
32
0
27 Mar 2024
Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models
Yi Luo
Zheng-Wen Lin
Yuhao Zhang
Jiashuo Sun
Chen Lin
Chengjin Xu
Xiangdong Su
Yelong Shen
Jian Guo
Yeyun Gong
LM&MA
ELM
ALM
AI4TS
24
1
0
18 Mar 2024
Farsight: Fostering Responsible AI Awareness During AI Application Prototyping
Zijie J. Wang
Chinmay Kulkarni
Lauren Wilcox
Michael Terry
Michael A. Madaio
38
43
0
23 Feb 2024
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
Jiyoung Lee
Minwoo Kim
Seungho Kim
Junghwan Kim
Seunghyun Won
Hwaran Lee
Edward Choi
ALM
29
11
0
21 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Tianyi Zhou
KELM
VLM
42
100
0
20 Feb 2024
Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models
Kang He
Yinghan Long
Kaushik Roy
21
2
0
15 Feb 2024
AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems
Clara Punzi
Roberto Pellungrini
Mattia Setzu
F. Giannotti
D. Pedreschi
19
5
0
09 Feb 2024
A Roadmap to Pluralistic Alignment
Taylor Sorensen
Jared Moore
Jillian R. Fisher
Mitchell L. Gordon
Niloofar Mireshghallah
...
Liwei Jiang
Ximing Lu
Nouha Dziri
Tim Althoff
Yejin Choi
65
80
0
07 Feb 2024
Measuring Implicit Bias in Explicitly Unbiased Large Language Models
Xuechunzi Bai
Angelina Wang
Ilia Sucholutsky
Thomas L. Griffiths
100
30
0
06 Feb 2024
Nevermind: Instruction Override and Moderation in Large Language Models
Edward Kim
ALM
18
0
0
05 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
16
7
0
02 Feb 2024
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Pratyush Maini
Skyler Seto
Richard He Bai
David Grangier
Yizhe Zhang
Navdeep Jaitly
SyDa
33
54
0
29 Jan 2024
Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models
Yunhong He
Jianling Qiu
Wei Zhang
Zhe Yuan
27
3
0
27 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
55
56
0
11 Jan 2024
1
2
3
Next