v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown

New Textual Corpora for Serbian Language Modeling

Mihailo Škorić

Nikola Janković

158

15 May 2024

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Raghuveer Peri

Sai Muralidhar Jayanthi

S. Ronanki

Anshu Bhatia

Karel Mundnich

...

Srikanth Vishnubhotla

283

14 May 2024

LMD3: Language Model Data Density Dependence

Garrett Honke

232

10 May 2024

Assessing and Verifying Task Utility in LLM-Powered ApplicationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Ahmed Hassan Awadallah

Charles L. A. Clarke

Julia Kiseleva

315

03 May 2024

Aloe: A Family of Fine-tuned Open Healthcare LLMs

Ashwin Kumar Gururajan

...

Lucia Urcelay-Ganzabal

Marta Gonzalez-Mallo

Sergio Alvarez-Napagao

Eduard Ayguadé-Parra

Ulises Cortés Dario Garcia-Gasulla

ELM LM&MA

311

03 May 2024

More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness

Aaron Jiaxun Li

Satyapriya Krishna

Himabindu Lakkaraju

197

29 Apr 2024

Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in

227

29 Apr 2024

Continual Learning of Large Language Models: A Comprehensive Survey

393

151

25 Apr 2024

Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents

399

25 Apr 2024

Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches

Pablo Biedma

Xiaoyuan Yi

Linus Huang

Maosong Sun

Xing Xie

PILM

350

19 Apr 2024

AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence

434

18 Apr 2024

Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models

298

17 Apr 2024

HLAT: High-quality Large Language Model Pre-trained on AWS Trainium

209

16 Apr 2024

Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

186

15 Apr 2024

Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward

385

12 Apr 2024

Scalable Language Model with Generalized Continual Learning

179

11 Apr 2024

High-Dimension Human Value Representation in Large Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

619

11 Apr 2024

Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents

Seth Lazar

SILM

185

10 Apr 2024

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge

Yu Ying Chiu

Amirhossein Ajalloeian

Yejin Choi

210

10 Apr 2024

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Paul Röttger

364

08 Apr 2024

Language Models as Critical Thinking Tools: A Case Study of Philosophers

212

06 Apr 2024

Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models

198

03 Apr 2024

NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning

328

30 Mar 2024

Contextual Moral Value Alignment Through Context-Based Aggregation

146

19 Mar 2024

Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

Siheng Chen

216

07 Mar 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

...

751

300

05 Mar 2024

Birbal: An efficient 7B instruct-model fine-tuned with curated datasets

Ashvini Jindal

P. Rajpoot

Ankur P. Parikh

145

04 Mar 2024

Evaluating Quantized Large Language Models

Luning Wang

Shengen Yan

277

28 Feb 2024

Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?

259

28 Feb 2024

FairBelief -- Assessing Harmful Beliefs in Language Models

Mattia Setzu

Marta Marchiori Manerba

Pasquale Minervini

Debora Nozza

226

27 Feb 2024

Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models

Xiaolong Wang

Yile Wang

Yuan Zhang

Ziyue Wang

Peng Li

Maosong Sun

Yang Liu

LRM

150

27 Feb 2024

Language Agents as Optimizable Graphs

374

26 Feb 2024

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Paul Röttger

Hinrich Schütze

275

126

26 Feb 2024

Eagle: Ethical Dataset Given from Real Interactions

Masahiro Kaneko

Danushka Bollegala

Timothy Baldwin

191

22 Feb 2024

KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge

Edward Choi

547

21 Feb 2024

Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems

422

20 Feb 2024

Enabling Weak LLMs to Judge Response Reliability via Meta Ranking

Zijun Liu

Boqun Kou

Peng Li

Ming Yan

Ji Zhang

Fei Huang

Yang Liu

264

19 Feb 2024

Uncovering Latent Human Wellbeing in Language Model Embeddings

Adam Gleave

209

19 Feb 2024

RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations

...

250

17 Feb 2024

Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications

Ahmed Hassan Awadallah

262

14 Feb 2024

A Roadmap to Pluralistic Alignment

Niloofar Mireshghallah

...

Yejin Choi

391

150

07 Feb 2024

Do Moral Judgment and Reasoning Capability of LLMs Change with Language? A Study using the Multilingual Defining Issues Test

199

03 Feb 2024

TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution

441

02 Feb 2024

Enhancing Ethical Explanations of Large Language Models through Iterative Symbolic Refinement

185

01 Feb 2024

Morality is Non-Binary: Building a Pluralist Moral Sentence Embedding Space using Contrastive Learning

275

30 Jan 2024

LongHealth: A Question Answering Benchmark with Long Clinical Documents

Lisa Christine Adams

Felix Busch

T. Han

Jean-Baptiste Excoffier

Matthieu Ortala

Alexander Loser

Hugo J. W. L. Aerts

Jakob Nikolas Kather

Daniel Truhn

Keno Bressem

ELM LM&MA AI4MH

231

25 Jan 2024

Towards Socially and Morally Aware RL agent: Reward Design With LLM

Zhaoyue Wang

240

23 Jan 2024

Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing ConstraintAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Fuzheng Zhang

342

11 Jan 2024

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

...

Qi Li

319

11 Jan 2024

171

09 Jan 2024