v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown

MERA: A Comprehensive LLM Evaluation in RussianAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Alena Fenogenova

...

274

09 Jan 2024

Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models

Yuqing Wang

Yun Zhao

VLM ReLM LRM

313

29 Dec 2023

Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities

...

341

22 Dec 2023

Learning Human-like Representations to Enable Learning Human Values

Andrea Wynn

Ilia Sucholutsky

Thomas Griffiths

271

21 Dec 2023

ALMANACS: A Simulatability Benchmark for Language Model Explainability

505

20 Dec 2023

Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Dirk Groeneveld

Anas Awadalla

Iz Beltagy

Akshita Bhagia

Ian H. Magnusson

Hao Peng

Oyvind Tafjord

Pete Walsh

Kyle Richardson

Jesse Dodge

265

15 Dec 2023

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak SupervisionInternational Conference on Machine Learning (ICML), 2023

...

361

387

14 Dec 2023

CBQ: Cross-Block Quantization for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

...

777

13 Dec 2023

SM70: A Large Language Model for Medical Devices

12 Dec 2023

Cross Fertilizing Empathy from Brain to Machine as a Value Alignment Strategy

Devin Gonier

Adrian Adduci

Cassidy LoCascio

151

10 Dec 2023

MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-FollowingInternational Conference on Learning Representations (ICLR), 2023

Yuxuan Sun

265

05 Dec 2023

Tree of Attacks: Jailbreaking Black-Box LLMs AutomaticallyNeural Information Processing Systems (NeurIPS), 2023

354

449

04 Dec 2023

Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation

P. Bricman

182

01 Dec 2023

Foundational Moral Values for AI Alignment

Betty Hou

Brian Patrick Green

177

28 Nov 2023

A Survey of the Evolution of Language Model-Based Dialogue Systems: Data, Task and Models

456

28 Nov 2023

Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments

Liesbeth Allein

Maria Mihaela Trucscva

Marie-Francine Moens

211

27 Nov 2023

Case Repositories: Towards Case-Based Reasoning for AI Alignment

Amy X. Zhang

167

18 Nov 2023

MOKA: Moral Knowledge Augmentation for Moral Event Extraction

Xinliang Frederick Zhang

Winston Wu

Nick Beauchamp

Lu Wang

250

16 Nov 2023

LifeTox: Unveiling Implicit Toxicity in Life Advice

307

16 Nov 2023

How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their VulnerabilitiesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Lingbo Mo

Boshi Wang

Muhao Chen

Huan Sun

267

15 Nov 2023

When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks

Hao Peng

Xiaozhi Wang

...

Bin Xu

Lei Hou

Juanzi Li

258

15 Nov 2023

Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human ValuesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Xing Xie

299

15 Nov 2023

Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains

365

13 Nov 2023

MART: Improving LLM Safety with Multi-round Automatic Red-TeamingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Madian Khabsa

212

150

13 Nov 2023

Online Advertisements with LLMs: Opportunities and Challenges

Soheil Feizi

Mohammadtaghi Hajiaghayi

Keivan Rezaei

Suho Shin

OffRL

409

11 Nov 2023

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

...

736

187

09 Nov 2023

Mini Minds: Exploring Bebeshka and Zlata Baby Models

164

06 Nov 2023

Can LLMs Follow Simple Rules?

363

06 Nov 2023

MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment TasksNeural Information Processing Systems (NeurIPS), 2023

Tatsunori Hashimoto

273

30 Oct 2023

Moral Sparks in Social Media NarrativesACM Conference on Hypertext & Social Media (HT), 2023

Ruijie Xi

Munindar P. Singh

LRM

252

30 Oct 2023

EtiCor: Corpus for Analyzing LLMs for EtiquettesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Ashutosh Dwivedi

Pradhyumna Lavania

Ashutosh Modi

181

29 Oct 2023

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

187

24 Oct 2023

DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Xiao-Yu Guo

Yuan-Fang Li

Gholamreza Haffari

236

24 Oct 2023

An In-Context Schema Understanding Method for Knowledge Base Question AnsweringKnowledge Science, Engineering and Management (KSEM), 2023

199

22 Oct 2023

Values, Ethics, Morals? On the Use of Moral Concepts in NLP ResearchConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Karina Vida

Judith Simon

Anne Lauscher

243

21 Oct 2023

Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction LearningInternational Conference on Learning Representations (ICLR), 2023

Xing Xie

237

17 Oct 2023

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense NormsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yejin Choi

178

16 Oct 2023

$Is Certifying $\ell_p$ Robustness Still Worthwhile?$

Is Certifying

\ell_p

Robustness Still Worthwhile?

250

13 Oct 2023

Impact of Guidance and Interaction Strategies for LLM Use on Learner Performance and Perception

220

13 Oct 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and ValuesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Paul Röttger

361

11 Oct 2023

Case Law Grounding: Aligning Judgments of Humans and AI on Socially-Constructed ConceptsInternational Conference on Climate Informatics (ICCI), 2023

Quan Ze Chen

Amy X. Zhang

ELM

282

10 Oct 2023

Aligning Language Models with Human Preferences via a Bayesian ApproachNeural Information Processing Systems (NeurIPS), 2023

343

09 Oct 2023

STREAM: Social data and knowledge collective intelligence platform for TRaining Ethical AI ModelsAi & Society (AI & Society), 2023

Yi Zeng

210

09 Oct 2023

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

Ankit Shah

...

Bhiksha Raj

132

02 Oct 2023

EALM: Introducing Multidimensional Ethical Alignment in Conversational Information Retrieval

Yiyao Yu

Junjie Wang

Yuxiang Zhang

Lin Zhang

Yujiu Yang

Tetsuya Sakai

162

02 Oct 2023

ValueDCG: Measuring Comprehensive Human Value Understanding Ability of Language Models

326

30 Sep 2023

Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration

Qiushi Sun

Zhangyue Yin

Xiang Li

Zhiyong Wu

Xipeng Qiu

Lingpeng Kong

LRM LLMAG

395

30 Sep 2023

The Confidence-Competence Gap in Large Language Models: A Cognitive Study

230

28 Sep 2023

Large Language Model Alignment: A Survey

359

282

26 Sep 2023

Probing the Moral Development of Large Language Models through Defining Issues Test

246

23 Sep 2023