Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2008.02275
Cited By

Aligning AI With Shared Human Values

v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020

Jacob Steinhardt

ArXiv (abs)PDF HTML

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown

On the Relationship between Skill Neurons and Robustness in Prompt
Tuning

On the Relationship between Skill Neurons and Robustness in Prompt TuningInternational Conference on Language Resources and Evaluation (LREC), 2023

164

0

0

21 Sep 2023

An Evaluation of GPT-4 on the ETHICS Dataset

An Evaluation of GPT-4 on the ETHICS Dataset

Sergey Rodionov

126

6

0

19 Sep 2023

EchoPrompt: Instructing the Model to Rephrase Queries for Improved
In-context Learning

EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Rajasekhar Reddy Mekala

Yasaman Razeghi

337

16

0

16 Sep 2023

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness
and Ethics

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics

180

19

0

13 Sep 2023

SafetyBench: Evaluating the Safety of Large Language Models

SafetyBench: Evaluating the Safety of Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Xiao Liu

304

169

0

13 Sep 2023

Beyond Traditional Teaching: The Potential of Large Language Models and
Chatbots in Graduate Engineering Education

Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education

Ibrahem Alshybani

312

21

0

09 Sep 2023

Gesture-Informed Robot Assistance via Foundation Models

Gesture-Informed Robot Assistance via Foundation ModelsConference on Robot Learning (CoRL), 2023

Yuchen Cui

Dorsa Sadigh

154

27

0

06 Sep 2023

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights,
and Duties

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and DutiesAAAI Conference on Artificial Intelligence (AAAI), 2023

Taylor Sorensen

Valentina Pyatkin

...

Chandra Bhagavatula

Yejin Choi

492

90

0

02 Sep 2023

Curating Naturally Adversarial Datasets for Learning-Enabled Medical
Cyber-Physical Systems

Curating Naturally Adversarial Datasets for Learning-Enabled Medical Cyber-Physical Systems

196

0

0

01 Sep 2023

FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large
Language Models in Federated Learning

FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated LearningKnowledge Discovery and Data Mining (KDD), 2023

Jingren Zhou

322

200

0

01 Sep 2023

Is the U.S. Legal System Ready for AI's Challenges to Human Values?

Is the U.S. Legal System Ready for AI's Challenges to Human Values?

Tadayoshi Kohno

275

3

0

30 Aug 2023

Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through
the Lens of Moral Theories?

Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

Irwin King

278

38

0

29 Aug 2023

AI Deception: A Survey of Examples, Risks, and Potential Solutions

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Simon Goldstein

309

241

0

28 Aug 2023

The Poison of Alignment

The Poison of Alignment

127

10

0

25 Aug 2023

From Instructions to Intrinsic Human Values -- A Survey of Alignment
Goals for Big Models

From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models

Xing Xie

393

56

0

23 Aug 2023

Red-Teaming Large Language Models using Chain of Utterances for
Safety-Alignment

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

Rishabh Bhardwaj

386

214

0

18 Aug 2023

MaScQA: A Question Answering Dataset for Investigating Materials Science
Knowledge of Large Language Models

MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models

N. M. A. Krishnan

173

8

0

17 Aug 2023

FLIRT: Feedback Loop In-context Red Teaming

FLIRT: Feedback Loop In-context Red TeamingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Ninareh Mehrabi

Christophe Dupuy

250

87

0

08 Aug 2023

ChatMOF: An Autonomous AI System for Predicting and Generating
Metal-Organic Frameworks

ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks

209

16

0

01 Aug 2023

On the Trustworthiness Landscape of State-of-the-art Generative Models:
A Survey and Outlook

On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and OutlookInternational Journal of Computer Vision (IJCV), 2023

Chengyu Wang

Cen Chen

309

13

0

31 Jul 2023

Universal and Transferable Adversarial Attacks on Aligned Language
Models

Universal and Transferable Adversarial Attacks on Aligned Language Models

Nicholas Carlini

J. Zico Kolter

Matt Fredrikson

623

2,304

0

27 Jul 2023

Evaluating the Moral Beliefs Encoded in LLMs

Evaluating the Moral Beliefs Encoded in LLMsNeural Information Processing Systems (NeurIPS), 2023

247

204

0

26 Jul 2023

Do Models Explain Themselves? Counterfactual Simulatability of Natural
Language Explanations

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language ExplanationsInternational Conference on Machine Learning (ICML), 2023

Ruiqi Zhong

Narutatsu Ri

Jacob Steinhardt

Kathleen McKeown

225

74

0

17 Jul 2023

A Survey on Evaluation of Large Language Models

A Survey on Evaluation of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023

...

Yue Zhang

Philip S. Yu

700

2,769

0

06 Jul 2023

Minimum Levels of Interpretability for Artificial Moral Agents

Minimum Levels of Interpretability for Artificial Moral AgentsAI and Ethics (AE), 2023

Avish Vijayaraghavan

163

6

0

02 Jul 2023

Towards Measuring the Representation of Subjective Global Opinions in
Language Models

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Esin Durmus

Nicholas Schiefer

...

Deep Ganguli

360

337

0

28 Jun 2023

Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral
Reasoning

Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

169

17

0

25 Jun 2023

Apolitical Intelligence? Auditing Delphi's responses on controversial
political issues in the US

Apolitical Intelligence? Auditing Delphi's responses on controversial political issues in the US

136

0

0

22 Jun 2023

Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on
Normative Ethical Theory

Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on Normative Ethical Theory

Masashi Takeshita

187

11

0

20 Jun 2023

Toward Grounded Commonsense Reasoning

Toward Grounded Commonsense ReasoningIEEE International Conference on Robotics and Automation (ICRA), 2023

Siddharth Karamcheti

Dorsa Sadigh

271

15

0

14 Jun 2023

The Chai Platform's AI Safety Framework

The Chai Platform's AI Safety Framework

Aleksey Korshuk

198

2

0

05 Jun 2023

Knowledge of cultural moral norms in large language models

Knowledge of cultural moral norms in large language modelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

187

65

0

02 Jun 2023

Do Large Language Models Pay Similar Attention Like Human Programmers
When Generating Code?

Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?

229

20

0

02 Jun 2023

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
Datasets

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark DatasetsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Md Tahmid Rahman Laskar

Md Amran Hossen Bhuiyan

500

215

0

29 May 2023

KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
Language Model Application

KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model ApplicationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

360

34

0

28 May 2023

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
Responses Created Through Human-Machine Collaboration

SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine CollaborationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

...

216

18

0

28 May 2023

What can Large Language Models do in chemistry? A comprehensive
benchmark on eight tasks

What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasksNeural Information Processing Systems (NeurIPS), 2023

Xiangliang Zhang

518

210

0

27 May 2023

NormBank: A Knowledge Bank of Situational Social Norms

NormBank: A Knowledge Bank of Situational Social NormsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Jane Dwivedi-Yu

Diyi Yang

337

56

0

26 May 2023

Training Socially Aligned Language Models on Simulated Social
Interactions

Training Socially Aligned Language Models on Simulated Social InteractionsInternational Conference on Learning Representations (ICLR), 2023

Ruibo Liu

Diyi Yang

Soroush Vosoughi

285

88

0

26 May 2023

EXnet: Efficient In-context Learning for Data-less Text classification

EXnet: Efficient In-context Learning for Data-less Text classification

Debaditya Shome

141

3

0

24 May 2023

Has It All Been Solved? Open NLP Research Questions Not Solved by Large
Language Models

Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023

Santiago Castro

...

Verónica Pérez-Rosas

Amélie Reymond

320

8

0

21 May 2023

"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions
of Large Language Models with Suggest-Critique-Reflect Process

"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process

Michael Kadantsev

198

15

0

04 May 2023

Can Large Language Models Be an Alternative to Human Evaluations?

Can Large Language Models Be an Alternative to Human Evaluations?Annual Meeting of the Association for Computational Linguistics (ACL), 2023

Cheng-Han Chiang

585

851

0

03 May 2023

Connecting the Dots in Trustworthy Artificial Intelligence: From AI
Principles, Ethics, and Key Requirements to Responsible AI Systems and
Regulation

Connecting the Dots in Trustworthy Artificial Intelligence: From AI Principles, Ethics, and Key Requirements to Responsible AI Systems and RegulationInformation Fusion (Inf. Fusion), 2023

Natalia Díaz Rodríguez

Mark Coeckelbergh

Marcos López de Prado

E. Herrera-Viedma

Francisco Herrera

339

455

0

02 May 2023

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural
Language Generation

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

Patrick Fernandes

António Farinhas

Pedro Henrique Martins

...

José G. C. de Souza

Tongshuang Wu

Graham Neubig

Marcely Zanon Boito

304

69

0

01 May 2023

Towards ethical multimodal systems

Towards ethical multimodal systems

222

3

0

26 Apr 2023

SocialDial: A Benchmark for Socially-Aware Dialogue Systems

SocialDial: A Benchmark for Socially-Aware Dialogue SystemsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

...

Ingrid Zukerman

Zhaleh Semnani Azad

Gholamreza Haffari

224

23

0

24 Apr 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards
and Ethical Behavior in the MACHIAVELLI Benchmark

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI BenchmarkInternational Conference on Machine Learning (ICML), 2023

Thomas Woodside

533

166

0

06 Apr 2023

Large AI Models in Health Informatics: Applications, Challenges, and the
Future

Large AI Models in Health Informatics: Applications, Challenges, and the FutureIEEE journal of biomedical and health informatics (IEEE JBHI), 2023

...

Wu Yuan

280

184

0

21 Mar 2023

Towards the Scalable Evaluation of Cooperativeness in Language Models

Towards the Scalable Evaluation of Cooperativeness in Language Models

242

8

0

16 Mar 2023

1 2 3...10 7 8 9