ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.02275
  4. Cited By
Aligning AI With Shared Human Values
v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Haibin Zhang
Basel Alomair
Jacob Steinhardt
ArXiv (abs)PDFHTML

Papers citing "Aligning AI With Shared Human Values"

13 / 463 papers shown
Can Machines Learn Morality? The Delphi Experiment
Can Machines Learn Morality? The Delphi Experiment
Liwei Jiang
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jenny T Liang
...
Yulia Tsvetkov
Oren Etzioni
Maarten Sap
Regina A. Rini
Yejin Choi
FaML
336
153
0
14 Oct 2021
Unsolved Problems in ML Safety
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
748
345
0
28 Sep 2021
Towards Understanding and Mitigating Social Biases in Language Models
Towards Understanding and Mitigating Social Biases in Language Models
Paul Pu Liang
Chiyu Wu
Louis-Philippe Morency
Ruslan Salakhutdinov
247
474
0
24 Jun 2021
Conditional Contrastive Learning for Improving Fairness in
  Self-Supervised Learning
Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning
Martin Q. Ma
Yifan Hao
Paul Pu Liang
Han Zhao
Kun Zhang
Ruslan Salakhutdinov
Louis-Philippe Morency
SSL
265
19
0
05 Jun 2021
The R-U-A-Robot Dataset: Helping Avoid Chatbot Deception by Detecting
  User Questions About Human or Non-Human Identity
The R-U-A-Robot Dataset: Helping Avoid Chatbot Deception by Detecting User Questions About Human or Non-Human IdentityAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
David Gros
Yu Li
Zhou Yu
DeLMO
134
23
0
04 Jun 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELMAIMatALM
1.2K
910
0
20 May 2021
Adapting Language Models for Zero-shot Learning by Meta-tuning on
  Dataset and Prompt Collections
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt CollectionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Ruiqi Zhong
Kristy Lee
Zheng Zhang
Dan Klein
471
181
0
10 Apr 2021
Large Pre-trained Language Models Contain Human-like Biases of What is
  Right and Wrong to Do
Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to DoNature Machine Intelligence (Nat. Mach. Intell.), 2021
P. Schramowski
Cigdem Turan
Nico Andersen
Constantin Rothkopf
Kristian Kersting
305
359
0
08 Mar 2021
Fairness for Unobserved Characteristics: Insights from Technological
  Impacts on Queer Communities
Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer CommunitiesAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2021
Nenad Tomašev
Kevin R. McKee
Jackie Kay
Shakir Mohamed
FaML
204
105
0
03 Feb 2021
Moral Stories: Situated Reasoning about Norms, Intents, Actions, and
  their Consequences
Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their ConsequencesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Denis Emelin
Ronan Le Bras
Jena D. Hwang
Maxwell Forbes
Yejin Choi
LRM
288
150
0
31 Dec 2020
Social Chemistry 101: Learning to Reason about Social and Moral Norms
Social Chemistry 101: Learning to Reason about Social and Moral NormsConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Maxwell Forbes
Jena D. Hwang
Vered Shwartz
Maarten Sap
Yejin Choi
293
310
0
01 Nov 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language UnderstandingInternational Conference on Learning Representations (ICLR), 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
2.3K
6,566
0
07 Sep 2020
Natural Adversarial Examples
Natural Adversarial ExamplesComputer Vision and Pattern Recognition (CVPR), 2019
Dan Hendrycks
Kevin Zhao
Steven Basart
Jacob Steinhardt
Basel Alomair
OODD
930
1,746
0
16 Jul 2019
Previous
123...1089