ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19241
  4. Cited By
ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment

ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment

25 May 2025
Xiaoqiang Lin
Arun Verma
Zhongxiang Dai
Daniela Rus
See-Kiong Ng
Bryan Kian Hsiang Low
ArXivPDFHTML

Papers citing "ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment"

32 / 32 papers shown
Title
Active Human Feedback Collection via Neural Contextual Dueling Bandits
Active Human Feedback Collection via Neural Contextual Dueling Bandits
Arun Verma
Xiaoqiang Lin
Zhongxiang Dai
Daniela Rus
Bryan Kian Hsiang Low
52
1
0
16 Apr 2025
Sample-Efficient Alignment for LLMs
Sample-Efficient Alignment for LLMs
Zichen Liu
Changyu Chen
Chao Du
Wee Sun Lee
Min Lin
55
4
0
03 Nov 2024
Deep Bayesian Active Learning for Preference Modeling in Large Language
  Models
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Luckeciano C. Melo
P. Tigas
Alessandro Abate
Yarin Gal
73
9
0
14 Jun 2024
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Feel-Good Thompson Sampling for Contextual Dueling Bandits
Xuheng Li
Heyang Zhao
Quanquan Gu
61
12
0
09 Apr 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
165
460
0
13 Mar 2024
Active Preference Optimization for Sample Efficient RLHF
Active Preference Optimization for Sample Efficient RLHF
Nirjhar Das
Souradip Chakraborty
Aldo Pacchiano
Sayak Ray Chowdhury
64
17
0
16 Feb 2024
Active Preference Learning for Large Language Models
Active Preference Learning for Large Language Models
William Muldrew
Peter Hayes
Mingtian Zhang
David Barber
55
20
0
12 Feb 2024
LESS: Selecting Influential Data for Targeted Instruction Tuning
LESS: Selecting Influential Data for Targeted Instruction Tuning
Mengzhou Xia
Sadhika Malladi
Suchin Gururangan
Sanjeev Arora
Danqi Chen
114
210
0
06 Feb 2024
Sample Efficient Preference Alignment in LLMs via Active Exploration
Sample Efficient Preference Alignment in LLMs via Active Exploration
Viraj Mehta
Vikramjeet Das
Ojash Neopane
Yijia Dai
Ilija Bogunovic
Ilija Bogunovic
Willie Neiswanger
Stefano Ermon
Jeff Schneider
Willie Neiswanger
OffRL
74
11
0
01 Dec 2023
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
Qiwei Di
Tao Jin
Yue Wu
Heyang Zhao
Farzad Farnoud
Quanquan Gu
54
13
0
02 Oct 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
282
3,712
0
29 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
644
13,788
0
15 Mar 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
72
195
0
26 Jan 2023
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
212
2,457
0
12 Apr 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
694
12,525
0
04 Mar 2022
Stochastic Contextual Dueling Bandits under Linear Stochastic
  Transitivity Models
Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models
Viktor Bengs
Aadirupa Saha
Eyke Hüllermeier
18
23
0
09 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
582
9,009
0
28 Jan 2022
WebGPT: Browser-assisted question-answering with human feedback
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
...
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
164
1,241
0
17 Dec 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
155
5,328
0
07 Jul 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
238
10,099
0
17 Jun 2021
Online Algorithm for Unsupervised Sequential Selection with Contextual
  Information
Online Algorithm for Unsupervised Sequential Selection with Contextual Information
Arun Verma
M. Hanawal
Csaba Szepesvári
Venkatesh Saligrama
26
6
0
23 Oct 2020
Thompson Sampling for Unsupervised Sequential Selection
Thompson Sampling for Unsupervised Sequential Selection
Arun Verma
M. Hanawal
N. Hemachandra
29
5
0
16 Sep 2020
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
180
2,071
0
02 Sep 2020
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
643
11,979
0
27 Aug 2019
Online Algorithm for Unsupervised Sensor Selection
Online Algorithm for Unsupervised Sensor Selection
Arun Verma
M. Hanawal
Csaba Szepesvári
Venkatesh Saligrama
16
12
0
15 Jan 2019
Batch Active Preference-Based Learning of Reward Functions
Batch Active Preference-Based Learning of Reward Functions
Erdem Biyik
Dorsa Sadigh
85
111
0
10 Oct 2018
Preference-based Online Learning with Dueling Bandits: A Survey
Preference-based Online Learning with Dueling Bandits: A Survey
Viktor Bengs
R. Busa-Fekete
Adil El Mesaoudi-Paul
Eyke Hüllermeier
64
113
0
30 Jul 2018
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
96
3,243
0
12 Jun 2017
A Relative Exponential Weighing Algorithm for Adversarial Utility-based
  Dueling Bandits
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
Pratik Gajane
Tanguy Urvoy
Fabrice Clérot
74
46
0
15 Jan 2016
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem
Junpei Komiyama
Junya Honda
H. Kashima
Hiroshi Nakagawa
108
93
0
08 Jun 2015
Reducing Dueling Bandits to Cardinal Bandits
Reducing Dueling Bandits to Cardinal Bandits
Nir Ailon
Thorsten Joachims
Zohar Karnin
101
137
0
14 May 2014
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
M. Zoghi
Shimon Whiteson
Rémi Munos
Maarten de Rijke
65
143
0
12 Dec 2013
1