v1v2v3v4v5v6 (latest)

Aligning AI With Shared Human Values

5 August 2020

Papers citing "Aligning AI With Shared Human Values"

50 / 463 papers shown

Conversations: Love Them, Hate Them, Steer Them

Niranjan Chebrolu

Gerard Christopher Yeo

Kokil Jaidka

144

23 May 2025

Unveiling the Basin-Like Loss Landscape in Large Language Models

433

23 May 2025

The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas

202

23 May 2025

MixAT: Combining Continuous and Discrete Adversarial Training for LLMs

303

22 May 2025

Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs

301

22 May 2025

ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training

292

22 May 2025

Cost-aware LLM-based Online Dataset Annotation

Eray Can Elumar

Cem Tekin

Osman Yagan

257

21 May 2025

Kaleidoscope Gallery: Exploring Ethics and Generative AI Through ArtCreativity & Cognition (C&C), 2025

219

20 May 2025

Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations

352

19 May 2025

Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning

392

17 May 2025

NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context

349

13 May 2025

Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge

218

09 May 2025

Advancing and Benchmarking Personalized Tool Invocation for LLMs

232

07 May 2025

FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation

Chaitali Bhattacharyya

305

01 May 2025

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

...

548

26 Apr 2025

Auditing the Ethical Logic of Generative AI Models

277

24 Apr 2025

The Digital Cybersecurity Expert: How Far Have We Come?IEEE Symposium on Security and Privacy (S&P), 2025

293

16 Apr 2025

CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives

497

15 Apr 2025

HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving

460

14 Apr 2025

Visual moral inference and communication

Warren Zhu

Aida Ramezani

Yang Xu

150

12 Apr 2025

RAISE: Reinforced Adaptive Instruction Selection For Large Language Models

...

562

09 Apr 2025

Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators

231

08 Apr 2025

SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding

342

05 Apr 2025

Entropy-Based Block Pruning for Efficient Large Language Models

212

04 Apr 2025

Register Always Matters: Analysis of LLM Pretraining Data Through the Lens of Language Variation

292

02 Apr 2025

From TOWER to SPIRE: Adding the Speech Modality to a Translation-Specialist LLM

423

13 Mar 2025

Backtracking for Safety

Siddhartha Reddy Jonnalagadda

KELM

241

11 Mar 2025

Stay Focused: Problem Drift in Multi-Agent Debate

Jonas Becker

Lars Benedikt Kaesberg

469

26 Feb 2025

Speaking the Right Language: The Impact of Expertise Alignment in User-AI Interactions

Shramay Palta

Nirupama Chandrasekaran

Rachel Rudinger

Scott Counts

252

25 Feb 2025

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

Subhash Kantamneni

Joshua Engels

Senthooran Rajamanoharan

Max Tegmark

Neel Nanda

348

23 Feb 2025

Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoralAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Shivani Kumar

David Jurgens

LRM

299

21 Feb 2025

Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation

938

21 Feb 2025

Self-Taught Agentic Long Context UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

338

21 Feb 2025

Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models

M. Russinovich

Ahmed Salem

CLL MU

372

20 Feb 2025

Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different LanguagesInternational Conference on Human Factors in Computing Systems (CHI), 2025

Shreyan Biswas

Alexander Erlei

U. Gadiraju

402

13 Feb 2025

The Odyssey of the Fittest: Can Agents Survive and Still Be Good?

Dylan Waldner

Risto Miikkulainen

400

08 Feb 2025

Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search

263

03 Feb 2025

Normative Evaluation of Large Language Models with Everyday Moral DilemmasConference on Fairness, Accountability and Transparency (FAccT), 2025

Pratik S. Sachdeva

Tom van Nuenen

ELM

191

30 Jan 2025

Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate SelectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

210

28 Jan 2025

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024

714

28 Jan 2025

The Goofus & Gallant Story Corpus for Practical Value AlignmentInternational Conference on Machine Learning and Applications (ICMLA), 2024

230

17 Jan 2025

HyGen: Efficient LLM Serving via Elastic Online-Offline Request Co-location

Ting Sun

Penghan Wang

Fan Lai

1.3K

15 Jan 2025

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds

312

12 Jan 2025

^3

oralBench: A MultiModal Moral Benchmark for LVLMs

279

31 Dec 2024

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

...

266

31 Dec 2024

SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization

265

21 Dec 2024

ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models

403

17 Dec 2024

Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor

Ashwin Baluja

168

01 Dec 2024

TAROT: Targeted Data Selection via Optimal Transport

554

30 Nov 2024

Towards Robust Evaluation of Unlearning in LLMs via Data TransformationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

266

23 Nov 2024