AI Alignment and Social Choice: Fundamental Limitations and Policy Implications

24 October 2023

Papers citing "AI Alignment and Social Choice: Fundamental Limitations and Policy Implications"

17 / 17 papers shown

Title
Synthetic media and computational capitalism: towards a critical theory of artificial intelligence David M. Berry 39 0 0 22 Mar 2025
Balancing Innovation and Integrity: AI Integration in Liberal Arts College Administration Ian Olivo Read 36 0 0 20 Feb 2025
Game Theory Meets Large Language Models: A Systematic Survey Haoran Sun Yusen Wu Yukun Cheng Xu Chu LM&MA OffRL AI4CE 60 1 0 13 Feb 2025
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking Benjamin Feuer Micah Goldblum Teresa Datta Sanjana Nambiar Raz Besaleli Samuel Dooley Max Cembalest John P. Dickerson ALM 37 0 0 28 Jan 2025
Can an AI Agent Safely Run a Government? Existence of Probably Approximately Aligned Policies Frédéric Berdoz Roger Wattenhofer 93 0 0 21 Nov 2024
Representative Social Choice: From Learning Theory to AI Alignment Tianyi Qiu FedML 38 2 0 31 Oct 2024
Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI Hadassah Harland Richard Dazeley Peter Vamplew Hashini Senaratne Bahareh Nakisa Francisco Cruz 29 2 0 31 Oct 2024
Beyond Preferences in AI Alignment Tan Zhi-Xuan Micah Carroll Matija Franklin Hal Ashton 35 16 0 30 Aug 2024
Improving Context-Aware Preference Modeling for Language Models Silviu Pitis Ziang Xiao Nicolas Le Roux Alessandro Sordoni 38 8 0 20 Jul 2024
Axioms for AI Alignment from Human Feedback Luise Ge Daniel Halpern Evi Micha Ariel D. Procaccia Itai Shapira Yevgeniy Vorobeychik Junlin Wu 41 15 0 23 May 2024
Mapping Social Choice Theory to RLHF Jessica Dai Eve Fleisig 27 11 0 19 Apr 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback Vincent Conitzer Rachel Freedman J. Heitzig Wesley H. Holliday Bob M. Jacobs ... Eric Pacuit Stuart Russell Hailey Schoelkopf Emanuel Tewolde W. Zwicker 33 28 0 16 Apr 2024
A Roadmap to Pluralistic Alignment Taylor Sorensen Jared Moore Jillian R. Fisher Mitchell L. Gordon Niloofar Mireshghallah ... Liwei Jiang Ximing Lu Nouha Dziri Tim Althoff Yejin Choi 65 80 0 07 Feb 2024
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF Anand Siththaranjan Cassidy Laidlaw Dylan Hadfield-Menell 29 55 0 13 Dec 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Stephen Casper Xander Davies Claudia Shi T. Gilbert Jérémy Scheurer ... Erdem Biyik Anca Dragan David M. Krueger Dorsa Sadigh Dylan Hadfield-Menell ALM OffRL 47 472 0 27 Jul 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 313 11,915 0 04 Mar 2022
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 280 1,587 0 18 Sep 2019