ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.02275
  4. Cited By
Aligning AI With Shared Human Values

Aligning AI With Shared Human Values

5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
J. Li
D. Song
Jacob Steinhardt
ArXivPDFHTML

Papers citing "Aligning AI With Shared Human Values"

50 / 347 papers shown
Title
MaScQA: A Question Answering Dataset for Investigating Materials Science
  Knowledge of Large Language Models
MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models
Mohd Zaki
J. Jayadeva
Mausam
N. M. A. Krishnan
ELM
6
4
0
17 Aug 2023
FLIRT: Feedback Loop In-context Red Teaming
FLIRT: Feedback Loop In-context Red Teaming
Ninareh Mehrabi
Palash Goyal
Christophe Dupuy
Qian Hu
Shalini Ghosh
R. Zemel
Kai-Wei Chang
Aram Galstyan
Rahul Gupta
DiffM
21
55
0
08 Aug 2023
ChatMOF: An Autonomous AI System for Predicting and Generating
  Metal-Organic Frameworks
ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks
Y. Kang
Jihan Kim
AI4CE
LLMAG
30
12
0
01 Aug 2023
On the Trustworthiness Landscape of State-of-the-art Generative Models:
  A Survey and Outlook
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
Mingyuan Fan
Chengyu Wang
Cen Chen
Yang Liu
Jun Huang
HILM
31
3
0
31 Jul 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
69
1,239
0
27 Jul 2023
Evaluating the Moral Beliefs Encoded in LLMs
Evaluating the Moral Beliefs Encoded in LLMs
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
25
115
0
26 Jul 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural
  Language Explanations
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
24
47
0
17 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
58
1,506
0
06 Jul 2023
Minimum Levels of Interpretability for Artificial Moral Agents
Minimum Levels of Interpretability for Artificial Moral Agents
Avish Vijayaraghavan
C. Badea
AI4CE
25
5
0
02 Jul 2023
Towards Measuring the Representation of Subjective Global Opinions in
  Language Models
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
33
205
0
28 Jun 2023
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral
  Reasoning
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Xiao Ma
Swaroop Mishra
Ahmad Beirami
Alex Beutel
Jilin Chen
ELM
ReLM
LRM
22
12
0
25 Jun 2023
Apolitical Intelligence? Auditing Delphi's responses on controversial
  political issues in the US
Apolitical Intelligence? Auditing Delphi's responses on controversial political issues in the US
J. H. Rystrøm
11
0
0
22 Jun 2023
Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on
  Normative Ethical Theory
Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on Normative Ethical Theory
Masashi Takeshita
Rafal Rzepka
K. Araki
13
8
0
20 Jun 2023
Toward Grounded Commonsense Reasoning
Toward Grounded Commonsense Reasoning
Minae Kwon
Hengyuan Hu
Vivek Myers
Siddharth Karamcheti
Anca Dragan
Dorsa Sadigh
LM&Ro
ReLM
LRM
36
9
0
14 Jun 2023
The Chai Platform's AI Safety Framework
The Chai Platform's AI Safety Framework
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
13
2
0
05 Jun 2023
Knowledge of cultural moral norms in large language models
Knowledge of cultural moral norms in large language models
Aida Ramezani
Yang Xu
ELM
AILaw
24
46
0
02 Jun 2023
Do Large Language Models Pay Similar Attention Like Human Programmers
  When Generating Code?
Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?
Bonan Kou
Shengmai Chen
Zhijie Wang
Lei Ma
Tianyi Zhang
ALM
11
13
0
02 Jun 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
  Datasets
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq R. Joty
J. Huang
LM&MA
ELM
ALM
41
178
0
29 May 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
  Language Model Application
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
30
28
0
28 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
  Responses Created Through Human-Machine Collaboration
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
M. Cha
...
Eun-Ju Lee
Yong Lim
Alice H. Oh
San-hee Park
Jung-Woo Ha
36
16
0
28 May 2023
What can Large Language Models do in chemistry? A comprehensive
  benchmark on eight tasks
What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks
Taicheng Guo
Kehan Guo
B. Nan
Zhengwen Liang
Zhichun Guo
Nitesh V. Chawla
Olaf Wiest
Xiangliang Zhang
ELM
44
126
0
27 May 2023
NormBank: A Knowledge Bank of Situational Social Norms
NormBank: A Knowledge Bank of Situational Social Norms
Caleb Ziems
Jane Dwivedi-Yu
Yi-Chia Wang
A. Halevy
Diyi Yang
18
41
0
26 May 2023
Training Socially Aligned Language Models on Simulated Social
  Interactions
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
18
45
0
26 May 2023
EXnet: Efficient In-context Learning for Data-less Text classification
EXnet: Efficient In-context Learning for Data-less Text classification
Debaditya Shome
Kuldeep Yadav
12
1
0
24 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large
  Language Models
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
29
6
0
21 May 2023
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions
  of Large Language Models with Suggest-Critique-Reflect Process
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process
Anna Glazkova
Zongjie Li
Michael Kadantsev
Maksim Glazkov
KELM
22
14
0
04 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
209
570
0
03 May 2023
Connecting the Dots in Trustworthy Artificial Intelligence: From AI
  Principles, Ethics, and Key Requirements to Responsible AI Systems and
  Regulation
Connecting the Dots in Trustworthy Artificial Intelligence: From AI Principles, Ethics, and Key Requirements to Responsible AI Systems and Regulation
Natalia Díaz Rodríguez
Javier Del Ser
Mark Coeckelbergh
Marcos López de Prado
E. Herrera-Viedma
Francisco Herrera
XAI
27
262
0
02 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
  Language Generation
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
113
56
0
01 May 2023
Towards ethical multimodal systems
Towards ethical multimodal systems
Alexis Roger
Esma Aïmeur
Irina Rish
27
3
0
26 Apr 2023
SocialDial: A Benchmark for Socially-Aware Dialogue Systems
SocialDial: A Benchmark for Socially-Aware Dialogue Systems
Haolan Zhan
Zhuang Li
Yufei Wang
Linhao Luo
Tao Feng
...
Lay-Ki Soon
Suraj Sharma
Ingrid Zukerman
Zhaleh Semnani Azad
Gholamreza Haffari
49
16
0
24 Apr 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
  and Ethical Behavior in the MACHIAVELLI Benchmark
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
24
126
0
06 Apr 2023
Large AI Models in Health Informatics: Applications, Challenges, and the
  Future
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny P. L. Lo
AI4MH
LM&MA
40
127
0
21 Mar 2023
Towards the Scalable Evaluation of Cooperativeness in Language Models
Towards the Scalable Evaluation of Cooperativeness in Language Models
Alan Chan
Maxime Riché
Jesse Clifton
LLMAG
12
6
0
16 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for
  the alignment of large language models with personalised feedback
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
33
99
0
09 Mar 2023
Towards Safer Generative Language Models: A Survey on Safety Risks,
  Evaluations, and Improvements
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao-Lun Sun
Zhexin Zhang
Minlie Huang
LM&MA
ELM
26
15
0
18 Feb 2023
Commonsense Reasoning for Conversational AI: A Survey of the State of
  the Art
Commonsense Reasoning for Conversational AI: A Survey of the State of the Art
Christopher Richardson
Larry Heck
LRM
22
8
0
15 Feb 2023
Benchmarks for Automated Commonsense Reasoning: A Survey
Benchmarks for Automated Commonsense Reasoning: A Survey
E. Davis
ELM
LRM
19
57
0
09 Feb 2023
Everyone's Voice Matters: Quantifying Annotation Disagreement Using
  Demographic Information
Everyone's Voice Matters: Quantifying Annotation Disagreement Using Demographic Information
Ruyuan Wan
Jaehyung Kim
Dongyeop Kang
9
36
0
12 Jan 2023
A Multi-Level Framework for the AI Alignment Problem
A Multi-Level Framework for the AI Alignment Problem
Betty Hou
Brian Patrick Green
14
6
0
10 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
90
34
0
01 Jan 2023
Inclusive Artificial Intelligence
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
33
1
0
24 Dec 2022
MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via
  Moral Discussions
MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions
Hao-Lun Sun
Zhexin Zhang
Fei Mi
Yasheng Wang
W. Liu
Jianwei Cui
Bin Wang
Qun Liu
Minlie Huang
29
19
0
21 Dec 2022
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility
  Rewards for Social and Moral Situations
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations
Valentina Pyatkin
Jena D. Hwang
Vivek Srikumar
Ximing Lu
Liwei Jiang
Yejin Choi
Chandra Bhagavatula
24
33
0
20 Dec 2022
Despite "super-human" performance, current LLMs are unsuited for
  decisions about ethics and safety
Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety
Joshua Albrecht
Ellie Kitanidis
Abraham J. Fetterman
ELM
ReLM
ALM
LRM
14
17
0
13 Dec 2022
Ensuring Visual Commonsense Morality for Text-to-Image Generation
Ensuring Visual Commonsense Morality for Text-to-Image Generation
Seong-Oak Park
Suhong Moon
Jinkyu Kim
6
2
0
07 Dec 2022
Speaking Multiple Languages Affects the Moral Bias of Language Models
Speaking Multiple Languages Affects the Moral Bias of Language Models
Katharina Hämmerl
Bjorn Deiseroth
P. Schramowski
Jindrich Libovický
Constantin Rothkopf
Alexander M. Fraser
Kristian Kersting
21
31
0
14 Nov 2022
Zero-shot Visual Commonsense Immorality Prediction
Zero-shot Visual Commonsense Immorality Prediction
Yujin Jeong
Seongbeom Park
Suhong Moon
Jinkyu Kim
VLM
11
1
0
10 Nov 2022
Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE
Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE
Yuling Gu
Yao Fu
Valentina Pyatkin
Ian H. Magnusson
Bhavana Dalvi
Peter Clark
70
7
0
28 Oct 2022
TAPE: Assessing Few-shot Russian Language Understanding
TAPE: Assessing Few-shot Russian Language Understanding
Ekaterina Taktasheva
Tatiana Shavrina
Alena Fenogenova
Denis Shevelev
Nadezhda Katricheva
...
Svetlana Iordanskaia
Alena Spiridonova
Valentina Kurenshchikova
Ekaterina Artemova
Vladislav Mikhailov
AAML
37
10
0
23 Oct 2022
Previous
1234567
Next