ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.15006
  4. Cited By
Fine-tuning language models to find agreement among humans with diverse
  preferences

Fine-tuning language models to find agreement among humans with diverse preferences

28 November 2022
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
Jan Balaguer
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
    ALM
ArXivPDFHTML

Papers citing "Fine-tuning language models to find agreement among humans with diverse preferences"

50 / 123 papers shown
Title
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Avinandan Bose
Zhihan Xiong
Yuejie Chi
Simon S. Du
Lin Xiao
Maryam Fazel
26
0
0
20 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
23
1
0
03 Apr 2025
Pay More Attention to the Robustness of Prompt for Instruction Data Mining
Pay More Attention to the Robustness of Prompt for Instruction Data Mining
Qiang Wang
Dawei Feng
Xu Zhang
Ao Shen
Yang Xu
Bo Ding
H. Wang
AAML
46
0
0
31 Mar 2025
Strategyproof Reinforcement Learning from Human Feedback
Thomas Kleine Buening
Jiarui Gan
Debmalya Mandal
Marta Z. Kwiatkowska
47
0
0
13 Mar 2025
Artificial Intelligence in Deliberation: The AI Penalty and the Emergence of a New Deliberative Divide
Andreas Jungherr
Adrian Rauchfleisch
48
1
0
10 Mar 2025
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
Tianze Wang
Dongnan Gui
Yifan Hu
Shuhang Lin
Linjun Zhang
36
0
0
25 Feb 2025
The Battling Influencers Game: Nash Equilibria Structure of a Potential Game and Implications to Value Alignment
The Battling Influencers Game: Nash Equilibria Structure of a Potential Game and Implications to Value Alignment
Young Wu
Yancheng Zhu
Jin-Yi Cai
Xiaojin Zhu
94
0
0
03 Feb 2025
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards
Shresth Verma
Niclas Boehmer
Lingkai Kong
Milind Tambe
69
2
0
17 Jan 2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
SILM
68
5
0
08 Jan 2025
Fool Me, Fool Me: User Attitudes Toward LLM Falsehoods
Fool Me, Fool Me: User Attitudes Toward LLM Falsehoods
Diana Bar-Or Nirman
Ariel Weizman
Amos Azaria
HILM
81
1
0
16 Dec 2024
A dataset of questions on decision-theoretic reasoning in Newcomb-like
  problems
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld
Emery Cooper
Miles Kodama
Linh Chi Nguyen
Ethan Perez
34
1
0
15 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
73
0
0
12 Nov 2024
PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment
PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment
Dongxu Liu
Bing Xu
Yinzhuo Chen
Bufan Xu
Wenpeng Lu
Muyun Yang
T. Zhao
MoE
39
1
0
02 Nov 2024
L3Ms -- Lagrange Large Language Models
L3Ms -- Lagrange Large Language Models
Guneet S. Dhillon
Xingjian Shi
Yee Whye Teh
Alex Smola
100
0
0
28 Oct 2024
Fast Best-of-N Decoding via Speculative Rejection
Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun
Momin Haider
Ruiqi Zhang
Huitao Yang
Jiahao Qiu
Ming Yin
Mengdi Wang
Peter L. Bartlett
Andrea Zanette
BDL
40
28
0
26 Oct 2024
From Efficiency to Equity: Measuring Fairness in Preference Learning
From Efficiency to Equity: Measuring Fairness in Preference Learning
Shreeyash Gowaikar
Hugo Berard
Rashid Mushkani
Shin Koseki
16
0
0
24 Oct 2024
ComPO: Community Preferences for Language Model Personalization
ComPO: Community Preferences for Language Model Personalization
Sachin Kumar
Chan Young Park
Yulia Tsvetkov
Noah A. Smith
Hannaneh Hajishirzi
29
5
0
21 Oct 2024
MetaAlign: Align Large Language Models with Diverse Preferences during
  Inference Time
MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time
Mozhi Zhang
Pengyu Wang
Chenkun Tan
Mianqiu Huang
Dong Zhang
Yaqian Zhou
Xipeng Qiu
31
2
0
18 Oct 2024
Intuitions of Compromise: Utilitarianism vs. Contractualism
Intuitions of Compromise: Utilitarianism vs. Contractualism
Jared Moore
Yejin Choi
Sydney Levine
33
0
0
07 Oct 2024
The Perfect Blend: Redefining RLHF with Mixture of Judges
The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu
Eryk Helenowski
Karthik Abinav Sankararaman
Di Jin
Kaiyan Peng
...
Gabriel Cohen
Yuandong Tian
Hao Ma
Sinong Wang
Han Fang
31
9
0
30 Sep 2024
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
Hua Shen
Tiffany Knearem
Reshmi Ghosh
Yu-Ju Yang
Tanushree Mitra
Yun Huang
Yun Huang
50
0
0
15 Sep 2024
Larger Language Models Don't Care How You Think: Why Chain-of-Thought
  Prompting Fails in Subjective Tasks
Larger Language Models Don't Care How You Think: Why Chain-of-Thought Prompting Fails in Subjective Tasks
Georgios Chochlakis
Niyantha Maruthu Pandiyan
Kristina Lerman
Shrikanth Narayanan
ReLM
KELM
LRM
32
4
0
10 Sep 2024
Beyond Preferences in AI Alignment
Beyond Preferences in AI Alignment
Tan Zhi-Xuan
Micah Carroll
Matija Franklin
Hal Ashton
33
16
0
30 Aug 2024
How will advanced AI systems impact democracy?
How will advanced AI systems impact democracy?
Christopher Summerfield
Lisa Argyle
Michiel Bakker
Teddy Collins
Esin Durmus
...
Elizabeth Seger
Divya Siddarth
Henrik Skaug Sætra
MH Tessler
M. Botvinick
45
2
0
27 Aug 2024
Estimating Contribution Quality in Online Deliberations Using a Large
  Language Model
Estimating Contribution Quality in Online Deliberations Using a Large Language Model
Lodewijk Gelauff
Mohak Goyal
Bhargav Dindukurthi
Ashish Goel
Alice Siu
32
0
0
21 Aug 2024
Large Model Strategic Thinking, Small Model Efficiency: Transferring
  Theory of Mind in Large Language Models
Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models
Nunzio Lorè
Alireza Ilami
Babak Heydari
LRM
37
0
0
05 Aug 2024
Building Machines that Learn and Think with People
Building Machines that Learn and Think with People
Katherine M. Collins
Ilia Sucholutsky
Umang Bhatt
Kartik Chandra
Lionel Wong
...
Mark K. Ho
Vikash K. Mansinghka
Adrian Weller
Joshua B. Tenenbaum
Thomas L. Griffiths
40
29
0
22 Jul 2024
Improving Context-Aware Preference Modeling for Language Models
Improving Context-Aware Preference Modeling for Language Models
Silviu Pitis
Ziang Xiao
Nicolas Le Roux
Alessandro Sordoni
31
8
0
20 Jul 2024
Data-Centric Human Preference Optimization with Rationales
Data-Centric Human Preference Optimization with Rationales
H. Just
Ming Jin
Anit Kumar Sahu
Huy Phan
Ruoxi Jia
33
3
0
19 Jul 2024
ProgressGym: Alignment with a Millennium of Moral Progress
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu
Yang Zhang
Xuchuan Huang
Jasmine Xinze Li
Jiaming Ji
Yaodong Yang
AI4TS
31
4
0
28 Jun 2024
A Survey on Human Preference Learning for Large Language Models
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
39
8
0
17 Jun 2024
Effective Generative AI: The Human-Algorithm Centaur
Effective Generative AI: The Human-Algorithm Centaur
S. Saghafian
Lihi Idan
38
7
0
16 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for
  Cartoon Captioning
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
29
5
0
15 Jun 2024
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous
  Preferences
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
Daiwei Chen
Yi Chen
Aniket Rege
Ramya Korlakai Vinayak
38
17
0
12 Jun 2024
Multi-objective Reinforcement learning from AI Feedback
Multi-objective Reinforcement learning from AI Feedback
Marcus Williams
33
1
0
11 Jun 2024
A Robot Walks into a Bar: Can Language Models Serve as Creativity
  Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with
  Comedians
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
Piotr Wojciech Mirowski
Juliette Love
K. Mathewson
Shakir Mohamed
32
19
0
31 May 2024
Participation in the age of foundation models
Participation in the age of foundation models
Harini Suresh
Emily Tseng
Meg Young
Mary L. Gray
Emma Pierson
Karen Levy
36
20
0
29 May 2024
Embedding-Aligned Language Models
Embedding-Aligned Language Models
Guy Tennenholtz
Yinlam Chow
Chih-Wei Hsu
Lior Shani
Ethan Liang
Craig Boutilier
AIFin
24
1
0
24 May 2024
Direct Preference Optimization With Unobserved Preference Heterogeneity
Direct Preference Optimization With Unobserved Preference Heterogeneity
Keertana Chidambaram
Karthik Vinay Seetharaman
Vasilis Syrgkanis
39
7
0
23 May 2024
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in
  LLMs
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs
Bilgehan Sel
Priya Shanmugasundaram
Mohammad Kachuee
Kun Zhou
Ruoxi Jia
Ming Jin
LRM
38
2
0
21 May 2024
False consensus biases AI against vulnerable stakeholders
False consensus biases AI against vulnerable stakeholders
Mengchen Dong
Jean‐François Bonnefon
Iyad Rahwan
25
0
0
17 May 2024
Facilitating Opinion Diversity through Hybrid NLP Approaches
Facilitating Opinion Diversity through Hybrid NLP Approaches
Michiel van der Meer
39
0
0
15 May 2024
Large Language Models for Education: A Survey
Large Language Models for Education: A Survey
Hanyi Xu
Wensheng Gan
Zhenlian Qi
Jiayang Wu
Philip S. Yu
AI4Ed
ELM
54
14
0
12 May 2024
PatentGPT: A Large Language Model for Intellectual Property
PatentGPT: A Large Language Model for Intellectual Property
Zilong Bai
Ruiji Zhang
Linqing Chen
Qijun Cai
Yuan Zhong
...
Fu Bian
Xiaolong Gu
Lisha Zhang
Weilei Wang
Changyang Tu
41
3
0
28 Apr 2024
Annotator-Centric Active Learning for Subjective NLP Tasks
Annotator-Centric Active Learning for Subjective NLP Tasks
Michiel van der Meer
Neele Falk
P. Murukannaiah
Enrico Liscio
29
8
0
24 Apr 2024
Retrieval Augmented Generation for Domain-specific Question Answering
Retrieval Augmented Generation for Domain-specific Question Answering
Sanat Sharma
David Seunghyun Yoon
Franck Dernoncourt
Dewang Sultania
Karishma Bagga
Mengjiao Zhang
Trung Bui
Varun Kotte
RALM
39
9
0
23 Apr 2024
Generating Attractive and Authentic Copywriting from Customer Reviews
Generating Attractive and Authentic Copywriting from Customer Reviews
Yu-Xiang Lin
Wei-Yun Ma
34
2
0
22 Apr 2024
Just Like Me: The Role of Opinions and Personal Experiences in The
  Perception of Explanations in Subjective Decision-Making
Just Like Me: The Role of Opinions and Personal Experiences in The Perception of Explanations in Subjective Decision-Making
Sharon Ferguson
Paula Akemi Aoyagui
Young-Ho Kim
Anastasia Kuzminykh
17
2
0
19 Apr 2024
Exploring the landscape of large language models: Foundations,
  techniques, and challenges
Exploring the landscape of large language models: Foundations, techniques, and challenges
M. Moradi
Ke Yan
David Colwell
Matthias Samwald
Rhona Asgari
OffRL
44
1
0
18 Apr 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human
  Feedback
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
31
28
0
16 Apr 2024
123
Next