ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.02275
  4. Cited By
Aligning AI With Shared Human Values

Aligning AI With Shared Human Values

5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
J. Li
D. Song
Jacob Steinhardt
ArXivPDFHTML

Papers citing "Aligning AI With Shared Human Values"

50 / 347 papers shown
Title
Scalable Language Model with Generalized Continual Learning
Scalable Language Model with Generalized Continual Learning
Bohao Peng
Zhuotao Tian
Shu Liu
Mingchang Yang
Jiaya Jia
ALM
CLL
KELM
24
13
0
11 Apr 2024
High-Dimension Human Value Representation in Large Language Models
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
63
5
0
11 Apr 2024
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of
  Generative Agents
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents
Seth Lazar
SILM
29
1
0
10 Apr 2024
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging
  LLMs' (Lack of) Multicultural Knowledge
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Yu Ying Chiu
Amirhossein Ajalloeian
Maria Antoniak
Chan Young Park
Shuyue Stella Li
Mehar Bhatia
Sahithya Ravi
Yulia Tsvetkov
Vered Shwartz
Yejin Choi
36
20
0
10 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
58
30
0
08 Apr 2024
Language Models as Critical Thinking Tools: A Case Study of Philosophers
Language Models as Critical Thinking Tools: A Case Study of Philosophers
Andre Ye
Jared Moore
Rose Novick
Amy X. Zhang
KELM
ELM
LRM
LLMAG
23
7
0
06 Apr 2024
Conifer: Improving Complex Constrained Instruction-Following Ability of
  Large Language Models
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
Haoran Sun
Lixin Liu
Junjie Li
Fengyu Wang
Baohua Dong
Ran Lin
Ruohui Huang
25
14
0
03 Apr 2024
NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning
NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning
Eli Schwartz
Leshem Choshen
J. Shtok
Sivan Doveh
Leonid Karlinsky
Assaf Arbelle
28
13
0
30 Mar 2024
Contextual Moral Value Alignment Through Context-Based Aggregation
Contextual Moral Value Alignment Through Context-Based Aggregation
Pierre L. Dognin
Jesus Rios
Ronny Luss
Inkit Padhi
Matthew D Riemer
Miao Liu
P. Sattigeri
Manish Nagireddy
Kush R. Varshney
Djallel Bouneffouf
39
5
0
19 Mar 2024
Enhancing Data Quality in Federated Fine-Tuning of Foundation Models
Enhancing Data Quality in Federated Fine-Tuning of Foundation Models
Wanru Zhao
Yaxin Du
Nicholas D. Lane
Siheng Chen
Yanfeng Wang
30
3
0
07 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
43
140
0
05 Mar 2024
Birbal: An efficient 7B instruct-model fine-tuned with curated datasets
Birbal: An efficient 7B instruct-model fine-tuned with curated datasets
Ashvini Jindal
P. Rajpoot
Ankur P. Parikh
30
6
0
04 Mar 2024
Evaluating Quantized Large Language Models
Evaluating Quantized Large Language Models
Shiyao Li
Xuefei Ning
Luning Wang
Tengxuan Liu
Xiangsheng Shi
Shengen Yan
Guohao Dai
Huazhong Yang
Yu-Xiang Wang
MQ
43
42
0
28 Feb 2024
Exploring Multilingual Concepts of Human Value in Large Language Models:
  Is Value Alignment Consistent, Transferable and Controllable across
  Languages?
Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?
Shaoyang Xu
Weilong Dong
Zishan Guo
Xinwei Wu
Deyi Xiong
33
6
0
28 Feb 2024
FairBelief -- Assessing Harmful Beliefs in Language Models
FairBelief -- Assessing Harmful Beliefs in Language Models
Mattia Setzu
Marta Marchiori Manerba
Pasquale Minervini
Debora Nozza
21
0
0
27 Feb 2024
Reasoning in Conversation: Solving Subjective Tasks through Dialogue
  Simulation for Large Language Models
Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models
Xiaolong Wang
Yile Wang
Yuan Zhang
Fuwen Luo
Peng Li
Maosong Sun
Yang Janet Liu
LRM
27
0
0
27 Feb 2024
Language Agents as Optimizable Graphs
Language Agents as Optimizable Graphs
Mingchen Zhuge
Wenyi Wang
Louis Kirsch
Francesco Faccio
Dmitrii Khizbullin
Jürgen Schmidhuber
LLMAG
29
19
0
26 Feb 2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations
  for Values and Opinions in Large Language Models
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
Paul Röttger
Valentin Hofmann
Valentina Pyatkin
Musashi Hinck
Hannah Rose Kirk
Hinrich Schütze
Dirk Hovy
ELM
21
53
0
26 Feb 2024
Eagle: Ethical Dataset Given from Real Interactions
Eagle: Ethical Dataset Given from Real Interactions
Masahiro Kaneko
Danushka Bollegala
Timothy Baldwin
38
3
0
22 Feb 2024
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common
  Knowledge
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
Jiyoung Lee
Minwoo Kim
Seungho Kim
Junghwan Kim
Seunghyun Won
Hwaran Lee
Edward Choi
ALM
29
11
0
21 Feb 2024
Incentive Compatibility for AI Alignment in Sociotechnical Systems:
  Positions and Prospects
Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects
Zhaowei Zhang
Fengshuo Bai
Mingzhi Wang
Haoyang Ye
Chengdong Ma
Yaodong Yang
27
4
0
20 Feb 2024
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking
Enabling Weak LLMs to Judge Response Reliability via Meta Ranking
Zijun Liu
Boqun Kou
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Janet Liu
24
2
0
19 Feb 2024
Uncovering Latent Human Wellbeing in Language Model Embeddings
Uncovering Latent Human Wellbeing in Language Model Embeddings
Pedro Freire
ChengCheng Tan
Adam Gleave
Dan Hendrycks
Scott Emmons
30
1
0
19 Feb 2024
RENOVI: A Benchmark Towards Remediating Norm Violations in
  Socio-Cultural Conversations
RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations
Haolan Zhan
Zhuang Li
Xiaoxi Kang
Tao Feng
Yuncheng Hua
...
Linhao Luo
Lay-Ki Soon
Zhaleh Semnani Azad
Ingrid Zukerman
Gholamreza Haffari
45
8
0
17 Feb 2024
Towards better Human-Agent Alignment: Assessing Task Utility in
  LLM-Powered Applications
Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications
Negar Arabzadeh
Julia Kiseleva
Qingyun Wu
Chi Wang
Ahmed Hassan Awadallah
Victor C. Dibia
Adam Fourney
Charles L. A. Clarke
LLMAG
29
7
0
14 Feb 2024
A Roadmap to Pluralistic Alignment
A Roadmap to Pluralistic Alignment
Taylor Sorensen
Jared Moore
Jillian R. Fisher
Mitchell L. Gordon
Niloofar Mireshghallah
...
Liwei Jiang
Ximing Lu
Nouha Dziri
Tim Althoff
Yejin Choi
65
80
0
07 Feb 2024
Do Moral Judgment and Reasoning Capability of LLMs Change with Language?
  A Study using the Multilingual Defining Issues Test
Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test
Aditi Khandelwal
Utkarsh Agarwal
Kumar Tanmay
Monojit Choudhury
ELM
LRM
22
6
0
03 Feb 2024
TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent
  Constitution
TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution
Wenyue Hua
Xianjun Yang
Zelong Li
Cheng Wei
Yongfeng Zhang
LLMAG
29
4
0
02 Feb 2024
Enhancing Ethical Explanations of Large Language Models through
  Iterative Symbolic Refinement
Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement
Xin Quan
Marco Valentino
Louise A. Dennis
André Freitas
LRM
15
11
0
01 Feb 2024
Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding
  Space using Contrastive Learning
Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding Space using Contrastive Learning
Jeongwoo Park
Enrico Liscio
P. Murukannaiah
AILaw
20
4
0
30 Jan 2024
LongHealth: A Question Answering Benchmark with Long Clinical Documents
LongHealth: A Question Answering Benchmark with Long Clinical Documents
Lisa Christine Adams
Felix Busch
T. Han
Jean-Baptiste Excoffier
Matthieu Ortala
Alexander Loser
Hugo J. W. L. Aerts
Jakob Nikolas Kather
Daniel Truhn
Keno Bressem
ELM
LM&MA
AI4MH
39
9
0
25 Jan 2024
Towards Socially and Morally Aware RL agent: Reward Design With LLM
Towards Socially and Morally Aware RL agent: Reward Design With LLM
Zhaoyue Wang
12
2
0
23 Jan 2024
Improving Large Language Models via Fine-grained Reinforcement Learning
  with Minimum Editing Constraint
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Junchen Wan
Fuzheng Zhang
Di Zhang
Ji-Rong Wen
KELM
31
32
0
11 Jan 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
55
56
0
11 Jan 2024
Concept Alignment
Concept Alignment
Sunayana Rane
Polyphony J. Bruna
Ilia Sucholutsky
Christopher Kello
Thomas L. Griffiths
CVBM
31
7
0
09 Jan 2024
MERA: A Comprehensive LLM Evaluation in Russian
MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova
Artem Chervyakov
Nikita Martynov
Anastasia Kozlova
Maria Tikhonova
...
Nikita Savushkin
Polina Mikhailova
Denis Dimitrov
Alexander Panchenko
Sergey Markov
ELM
28
10
0
09 Jan 2024
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language
  Models
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models
Yuqing Wang
Yun Zhao
VLM
ReLM
LRM
24
22
0
29 Dec 2023
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical
  Capabilities
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities
Yuhao Chen
Chloe Wong
Hanwen Yang
Juan Aguenza
Sai Bhujangari
...
Eric Phuong
Minghao Liu
Raja Kumar
Vanshika Vats
James Davis
32
1
0
22 Dec 2023
Learning Human-like Representations to Enable Learning Human Values
Learning Human-like Representations to Enable Learning Human Values
Andrea Wynn
Ilia Sucholutsky
Thomas L. Griffiths
16
4
0
21 Dec 2023
ALMANACS: A Simulatability Benchmark for Language Model Explainability
ALMANACS: A Simulatability Benchmark for Language Model Explainability
Edmund Mills
Shiye Su
Stuart J. Russell
Scott Emmons
46
7
0
20 Dec 2023
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Dirk Groeneveld
Anas Awadalla
Iz Beltagy
Akshita Bhagia
Ian H. Magnusson
Hao Peng
Oyvind Tafjord
Pete Walsh
Kyle Richardson
Jesse Dodge
111
1
0
15 Dec 2023
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak
  Supervision
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Collin Burns
Pavel Izmailov
Jan Hendrik Kirchner
Bowen Baker
Leo Gao
...
Adrien Ecoffet
Manas Joglekar
Jan Leike
Ilya Sutskever
Jeff Wu
ELM
39
258
0
14 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
27
12
0
13 Dec 2023
SM70: A Large Language Model for Medical Devices
SM70: A Large Language Model for Medical Devices
Anubhav Bhatti
Surajsinh Parmar
San Lee
LM&MA
AI4MH
23
2
0
12 Dec 2023
Cross Fertilizing Empathy from Brain to Machine as a Value Alignment
  Strategy
Cross Fertilizing Empathy from Brain to Machine as a Value Alignment Strategy
Devin Gonier
Adrian Adduci
Cassidy LoCascio
16
0
0
10 Dec 2023
MUFFIN: Curating Multi-Faceted Instructions for Improving
  Instruction-Following
MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following
Renze Lou
Kai Zhang
Jian Xie
Yuxuan Sun
Janice Ahn
Hanzi Xu
Yu Su
Wenpeng Yin
29
26
0
05 Dec 2023
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra
Manolis Zampetakis
Paul Kassianik
Blaine Nelson
Hyrum Anderson
Yaron Singer
Amin Karbasi
30
201
0
04 Dec 2023
Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation
Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation
P. Bricman
19
0
0
01 Dec 2023
Foundational Moral Values for AI Alignment
Foundational Moral Values for AI Alignment
Betty Hou
Brian Patrick Green
19
0
0
28 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
53
20
0
28 Nov 2023
Previous
1234567
Next