v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown

CFBench: A Comprehensive Constraints-Following Benchmark for LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

...

445

02 Aug 2024

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

...

272

31 Jul 2024

Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex ScenariosAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024

Camilla Bignotti

C. Camassa

AILaw ELM

257

29 Jul 2024

Blockchain for Large Language Model Security and Safety: A Holistic Survey

270

26 Jul 2024

The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models

375

25 Jul 2024

Course-Correction: Safety Alignment Using Synthetic Preferences

Haiqin Weng

Yan Liu

Tianwei Zhang

Wei Xu

Han Qiu

206

23 Jul 2024

Virtue Ethics For Ethically Tunable Robotic Assistants

Rajitha Ramanayake

Vivek Nallur

23 Jul 2024

ALLaM: Large Language Models for Arabic and English

...

233

22 Jul 2024

Internal Consistency and Self-Feedback in Large Language Models: A Survey

...

506

19 Jul 2024

BadRobot: Jailbreaking Embodied LLMs in the Physical World

...

Aishan Liu

Peijin Guo

Leo Yu Zhang

LM&Ro

464

16 Jul 2024

CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses

Jing Yao

Xiaoyuan Yi

Xing Xie

ELM ALM

294

15 Jul 2024

Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique

M. Russinovich

Ahmed Salem

437

15 Jul 2024

The Sociolinguistic Foundations of Language Modeling

310

12 Jul 2024

LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages

291

08 Jul 2024

Some Issues in Predictive Ethics Modeling: An Annotated Contrast Set of "Moral Stories"

Ben Fitzgerald

168

07 Jul 2024

AI Safety in Generative AI Large Language Models: A Survey

Lina Yao

387

06 Jul 2024

A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

Md Tahmid Rahman Laskar

Sawsan Alqahtani

M Saiful Bari

Mizanur Rahman

Mohammad Abdullah Matin Khan

...

Enamul Hoque

Jimmy Huang

283

04 Jul 2024

Multilingual Trolley Problems for Language Models

Zhijing Jin

...

356

02 Jul 2024

Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?

Nishant Balepur

Rachel Rudinger

196

02 Jul 2024

ProgressGym: Alignment with a Millennium of Moral Progress

Yaodong Yang

286

28 Jun 2024

ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting

432

28 Jun 2024

Improving Weak-to-Strong Generalization with Reliability-Aware Alignment

Yue Guo

Yi Yang

226

27 Jun 2024

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

Aakanksha

Arash Ahmadian

Beyza Ermis

Seraphina Goldfarb-Tarrant

Julia Kreutzer

Marzieh Fadaee

Sara Hooker

372

26 Jun 2024

DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

229

25 Jun 2024

Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?

Yuu Jinnai

342

24 Jun 2024

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

Mete Ozay

280

20 Jun 2024

LiveMind: Low-latency Large Language Models with Simultaneous Inference

328

20 Jun 2024

Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing

652

20 Jun 2024

Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting

Sagnik Mukherjee

Muhammad Farid Adilazuarda

280

17 Jun 2024

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Barbara Plank

301

16 Jun 2024

RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models

Yuqing Wang

Yun Zhao

LRM AAML ELM

258

16 Jun 2024

Toward Optimal LLM Alignments Using Two-Player Games

Xiaoying Zhang

...

Qi Zhang

Xuanjing Huang

Hang Li

Yang Liu

278

16 Jun 2024

Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity

Tam n. Nguyen

ELM

209

11 Jun 2024

Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain

299

10 Jun 2024

Scaling and evaluating sparse autoencoders

279

307

06 Jun 2024

MoralBench: Moral Evaluation of LLMs

351

06 Jun 2024

Exploring Human-AI Perception Alignment in Sensory Experiences: Do LLMs Understand Textile Hand?

188

05 Jun 2024

Are Large Language Models Chameleons?

Mingmeng Geng

Sihong He

Roberto Trotta

202

29 May 2024

FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models

Qianli Shen

252

28 May 2024

BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation

Ziniu Li

254

27 May 2024

On Bits and Bandits: Quantifying the Regret-Information Trade-off

530

26 May 2024

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

231

25 May 2024

Instruction Tuning With Loss Over InstructionsNeural Information Processing Systems (NeurIPS), 2024

288

23 May 2024

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based EvaluationNeural Information Processing Systems (NeurIPS), 2024

359

23 May 2024

CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models

316

22 May 2024

Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling

Yibo Wang

Yuanyuan Mao

Shi-ting Ni

180

22 May 2024

FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research

Jiajie Jin

Chenghao Zhang

Tong Zhao

Zhao Yang

Zhicheng Dou

Ji-Rong Wen

VLM

449

147

22 May 2024

Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs

Bilgehan Sel

Priya Shanmugasundaram

Ming Jin

283

21 May 2024

LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions

345

17 May 2024

Facilitating Opinion Diversity through Hybrid NLP ApproachesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Michiel van der Meer

318

15 May 2024