ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.02275
  4. Cited By
Aligning AI With Shared Human Values
v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
ArXiv (abs)PDFHTML

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown
New Textual Corpora for Serbian Language Modeling
New Textual Corpora for Serbian Language Modeling
Mihailo Škorić
Nikola Janković
158
2
0
15 May 2024
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large
  Language Models
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Raghuveer Peri
Sai Muralidhar Jayanthi
S. Ronanki
Anshu Bhatia
Karel Mundnich
...
Srikanth Vishnubhotla
Daniel Garcia-Romero
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
AAML
283
7
0
14 May 2024
LMD3: Language Model Data Density Dependence
LMD3: Language Model Data Density Dependence
John Kirchenbauer
Garrett Honke
Gowthami Somepalli
Jonas Geiping
Daphne Ippolito
Katherine Lee
Tom Goldstein
David Andre
232
8
0
10 May 2024
Assessing and Verifying Task Utility in LLM-Powered Applications
Assessing and Verifying Task Utility in LLM-Powered ApplicationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Negar Arabzadeh
Siging Huo
Nikhil Mehta
Qinqyun Wu
Chi Wang
Ahmed Hassan Awadallah
Charles L. A. Clarke
Julia Kiseleva
315
18
0
03 May 2024
Aloe: A Family of Fine-tuned Open Healthcare LLMs
Aloe: A Family of Fine-tuned Open Healthcare LLMs
Ashwin Kumar Gururajan
Enrique Lopez-Cuena
Jordi Bayarri-Planas
Adrián Tormos
Daniel Hinjos
...
Lucia Urcelay-Ganzabal
Marta Gonzalez-Mallo
Sergio Alvarez-Napagao
Eduard Ayguadé-Parra
Ulises Cortés Dario Garcia-Gasulla
ELMLM&MA
311
29
0
03 May 2024
More RLHF, More Trust? On The Impact of Human Preference Alignment On
  Language Model Trustworthiness
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness
Aaron Jiaxun Li
Satyapriya Krishna
Himabindu Lakkaraju
197
2
0
29 Apr 2024
Ethical Reasoning and Moral Value Alignment of LLMs Depend on the
  Language we Prompt them in
Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in
Utkarsh Agarwal
Kumar Tanmay
Aditi Khandelwal
Monojit Choudhury
LRM
227
37
0
29 Apr 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLLKELMLRM
393
151
0
25 Apr 2024
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society
  of LLM Agents
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
Giorgio Piatti
Zhijing Jin
Max Kleiman-Weiner
Bernhard Schölkopf
Mrinmaya Sachan
Amélie Reymond
LLMAG
399
55
0
25 Apr 2024
Beyond Human Norms: Unveiling Unique Values of Large Language Models
  through Interdisciplinary Approaches
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches
Pablo Biedma
Xiaoyuan Yi
Linus Huang
Maosong Sun
Xing Xie
PILM
350
9
0
19 Apr 2024
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
Minbeom Kim
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
434
7
0
18 Apr 2024
Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans
  and Language Models
Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models
Jan-Philipp Fränken
Kanishk Gandhi
Tori Qiu
Ayesha Khawaja
Noah D. Goodman
Tobias Gerstenberg
ELM
298
1
0
17 Apr 2024
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Haozheng Fan
Hao Zhou
Guangtai Huang
Parameswaran Raman
Xinwei Fu
Gaurav Gupta
Dhananjay Ram
Yida Wang
Jun Huan
209
12
0
16 Apr 2024
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput
  in Large Language Models
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Siyan Zhao
Daniel Israel
Karen Ullrich
Aditya Grover
KELMVLM
186
9
0
15 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Yuheng Huang
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
385
12
0
12 Apr 2024
Scalable Language Model with Generalized Continual Learning
Scalable Language Model with Generalized Continual Learning
Bohao Peng
Zhuotao Tian
Shu Liu
Mingchang Yang
Jiaya Jia
ALMCLLKELM
179
31
0
11 Apr 2024
High-Dimension Human Value Representation in Large Language Models
High-Dimension Human Value Representation in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
619
11
0
11 Apr 2024
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of
  Generative Agents
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents
Seth Lazar
SILM
185
1
0
10 Apr 2024
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging
  LLMs' (Lack of) Multicultural Knowledge
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Yu Ying Chiu
Amirhossein Ajalloeian
Maria Antoniak
Chan Young Park
Shuyue Stella Li
Mehar Bhatia
Sahithya Ravi
Yulia Tsvetkov
Vered Shwartz
Yejin Choi
210
35
0
10 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELMKELM
364
66
0
08 Apr 2024
Language Models as Critical Thinking Tools: A Case Study of Philosophers
Language Models as Critical Thinking Tools: A Case Study of Philosophers
Andre Ye
Jared Moore
Rose Novick
Amy X. Zhang
KELMELMLRMLLMAG
212
10
0
06 Apr 2024
Conifer: Improving Complex Constrained Instruction-Following Ability of
  Large Language Models
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
Haoran Sun
Lixin Liu
Junjie Li
Fengyu Wang
Baohua Dong
Ran Lin
Ruohui Huang
198
23
0
03 Apr 2024
NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning
NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning
Eli Schwartz
Leshem Choshen
J. Shtok
Sivan Doveh
Leonid Karlinsky
Assaf Arbelle
328
37
0
30 Mar 2024
Contextual Moral Value Alignment Through Context-Based Aggregation
Contextual Moral Value Alignment Through Context-Based Aggregation
Pierre Dognin
Jesus Rios
Ronny Luss
Inkit Padhi
Matthew D Riemer
Miao Liu
P. Sattigeri
Manish Nagireddy
Kush R. Varshney
Djallel Bouneffouf
146
10
0
19 Mar 2024
Enhancing Data Quality in Federated Fine-Tuning of Foundation Models
Enhancing Data Quality in Federated Fine-Tuning of Foundation Models
Wanru Zhao
Yaxin Du
Nicholas D. Lane
Siheng Chen
Yanfeng Wang
216
4
0
07 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
751
300
0
05 Mar 2024
Birbal: An efficient 7B instruct-model fine-tuned with curated datasets
Birbal: An efficient 7B instruct-model fine-tuned with curated datasets
Ashvini Jindal
P. Rajpoot
Ankur P. Parikh
145
6
0
04 Mar 2024
Evaluating Quantized Large Language Models
Evaluating Quantized Large Language Models
Shiyao Li
Xuefei Ning
Luning Wang
Tengxuan Liu
Xiangsheng Shi
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
277
77
0
28 Feb 2024
Exploring Multilingual Concepts of Human Value in Large Language Models:
  Is Value Alignment Consistent, Transferable and Controllable across
  Languages?
Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?
Shaoyang Xu
Weilong Dong
Zishan Guo
Xinwei Wu
Deyi Xiong
259
16
0
28 Feb 2024
FairBelief -- Assessing Harmful Beliefs in Language Models
FairBelief -- Assessing Harmful Beliefs in Language Models
Mattia Setzu
Marta Marchiori Manerba
Pasquale Minervini
Debora Nozza
226
0
0
27 Feb 2024
Reasoning in Conversation: Solving Subjective Tasks through Dialogue
  Simulation for Large Language Models
Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models
Xiaolong Wang
Yile Wang
Yuan Zhang
Ziyue Wang
Peng Li
Maosong Sun
Yang Liu
LRM
150
4
0
27 Feb 2024
Language Agents as Optimizable Graphs
Language Agents as Optimizable Graphs
Mingchen Zhuge
Wenyi Wang
Louis Kirsch
Francesco Faccio
Dmitrii Khizbullin
Jürgen Schmidhuber
LLMAG
374
33
0
26 Feb 2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations
  for Values and Opinions in Large Language Models
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
Paul Röttger
Valentin Hofmann
Valentina Pyatkin
Musashi Hinck
Hannah Rose Kirk
Hinrich Schütze
Dirk Hovy
ELM
275
126
0
26 Feb 2024
Eagle: Ethical Dataset Given from Real Interactions
Eagle: Ethical Dataset Given from Real Interactions
Masahiro Kaneko
Danushka Bollegala
Timothy Baldwin
191
4
0
22 Feb 2024
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common
  Knowledge
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
Jiyoung Lee
Minwoo Kim
Seungho Kim
Junghwan Kim
Seunghyun Won
Hwaran Lee
Edward Choi
ALM
547
31
0
21 Feb 2024
Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems
Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems
Zhaowei Zhang
Fengshuo Bai
Mingzhi Wang
Haoyang Ye
Chengdong Ma
Yaodong Yang
422
6
0
20 Feb 2024
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking
Zijun Liu
Boqun Kou
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
264
7
0
19 Feb 2024
Uncovering Latent Human Wellbeing in Language Model Embeddings
Uncovering Latent Human Wellbeing in Language Model Embeddings
Pedro Freire
ChengCheng Tan
Adam Gleave
Dan Hendrycks
Scott Emmons
209
1
0
19 Feb 2024
RENOVI: A Benchmark Towards Remediating Norm Violations in
  Socio-Cultural Conversations
RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations
Haolan Zhan
Zhuang Li
Xiaoxi Kang
Tao Feng
Yuncheng Hua
...
Linhao Luo
Lay-Ki Soon
Zhaleh Semnani Azad
Ingrid Zukerman
Gholamreza Haffari
250
13
0
17 Feb 2024
Towards better Human-Agent Alignment: Assessing Task Utility in
  LLM-Powered Applications
Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications
Negar Arabzadeh
Julia Kiseleva
Qingyun Wu
Chi Wang
Ahmed Hassan Awadallah
Victor C. Dibia
Adam Fourney
Charles L. A. Clarke
LLMAG
262
7
0
14 Feb 2024
A Roadmap to Pluralistic Alignment
A Roadmap to Pluralistic Alignment
Taylor Sorensen
Jared Moore
Jillian R. Fisher
Mitchell L. Gordon
Niloofar Mireshghallah
...
Liwei Jiang
Ximing Lu
Nouha Dziri
Tim Althoff
Yejin Choi
391
150
0
07 Feb 2024
Do Moral Judgment and Reasoning Capability of LLMs Change with Language?
  A Study using the Multilingual Defining Issues Test
Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test
Aditi Khandelwal
Utkarsh Agarwal
Kumar Tanmay
Monojit Choudhury
ELMLRM
199
14
0
03 Feb 2024
TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent
  Constitution
TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution
Qingfeng Lan
Xianjun Yang
Zelong Li
Cheng Wei
Yongfeng Zhang
LLMAG
441
4
0
02 Feb 2024
Enhancing Ethical Explanations of Large Language Models through
  Iterative Symbolic Refinement
Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement
Xin Quan
Marco Valentino
Louise A. Dennis
André Freitas
LRM
185
19
0
01 Feb 2024
Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding
  Space using Contrastive Learning
Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding Space using Contrastive Learning
Jeongwoo Park
Enrico Liscio
P. Murukannaiah
AILaw
275
7
0
30 Jan 2024
LongHealth: A Question Answering Benchmark with Long Clinical Documents
LongHealth: A Question Answering Benchmark with Long Clinical Documents
Lisa Christine Adams
Felix Busch
T. Han
Jean-Baptiste Excoffier
Matthieu Ortala
Alexander Loser
Hugo J. W. L. Aerts
Jakob Nikolas Kather
Daniel Truhn
Keno Bressem
ELMLM&MAAI4MH
231
21
0
25 Jan 2024
Towards Socially and Morally Aware RL agent: Reward Design With LLM
Towards Socially and Morally Aware RL agent: Reward Design With LLM
Zhaoyue Wang
240
4
0
23 Jan 2024
Improving Large Language Models via Fine-grained Reinforcement Learning
  with Minimum Editing Constraint
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing ConstraintAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Junchen Wan
Fuzheng Zhang
Chen Zhang
Ji-Rong Wen
KELM
342
41
0
11 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
ZuJie Wen
Ke Xu
Qi Li
319
99
0
11 Jan 2024
Concept Alignment
Concept Alignment
Sunayana Rane
Polyphony J. Bruna
Ilia Sucholutsky
Christopher Kello
Thomas Griffiths
CVBM
171
16
0
09 Jan 2024
Previous
123...1056789
Next