Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.07530
Cited By
v1
v2 (latest)
Improved Optimistic Algorithms for Logistic Bandits
International Conference on Machine Learning (ICML), 2020
18 February 2020
Louis Faury
Marc Abeille
Clément Calauzènes
Olivier Fercoq
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Improved Optimistic Algorithms for Logistic Bandits"
50 / 82 papers shown
Tractable Instances of Bilinear Maximization: Implementing LinUCB on Ellipsoids
International Journal of Intelligent Systems and Applications in Engineering (IJISAE), 2025
Raymond Zhang
Hedi Hadiji
Richard Combes
112
1
0
10 Nov 2025
Inference-Time Personalized Alignment with a Few User Preference Queries
Victor-Alexandru Pădurean
Parameswaran Kamalaruban
Nachiket Kotalwar
Alkis Gotovos
Adish Singla
181
0
0
04 Nov 2025
Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options
Joongkyu Lee
Seouh-won Yi
Min-hwan Oh
OffRL
184
0
0
21 Oct 2025
Exploration via Feature Perturbation in Contextual Bandits
Seouh-won Yi
Min-hwan Oh
AAML
219
0
0
20 Oct 2025
The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification
Tavor Z. Baharav
Spyros Dragazis
Aldo Pacchiano
138
0
0
01 Oct 2025
Stochastic Matching Bandits with Rare Optimization Updates
Jung-hun Kim
Min-hwan Oh
188
0
0
04 Sep 2025
Recycling History: Efficient Recommendations from Contextual Dueling Bandits
Suryanarayana Sankagiri
Jalal Etesami
Pouria Fatemi
Matthias Grossglauser
149
0
0
26 Aug 2025
Multi-User Contextual Cascading Bandits for Personalized Recommendation
Jiho Park
Huiwen Jia
124
1
0
19 Aug 2025
Achieving Limited Adaptivity for Multinomial Logistic Bandits
Sukruta Prakash Midigeshi
Tanmay Goyal
Gaurav Sinha
133
1
0
05 Aug 2025
Generalized Kernelized Bandits: A Novel Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds
Alberto Maria Metelli
Simone Drago
Marco Mussi
175
2
0
03 Aug 2025
PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
Sarat Chandra Bobbili
Ujwal Dinesha
Dheeraj Narasimha
S. Shakkottai
228
3
0
26 Jul 2025
Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update
Yu Zhang
Sheng-An Xu
Peng Zhao
Masashi Sugiyama
213
6
0
16 Jul 2025
Enjoying Non-linearity in Multinomial Logistic Bandits: A Minimax-Optimal Algorithm
Pierre Boudart
Pierre Gaillard
Alessandro Rudi
161
0
0
07 Jul 2025
Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback
Conference on Uncertainty in Artificial Intelligence (UAI), 2025
Tanmay Goyal
Gaurav Sinha
257
0
0
16 Jun 2025
Learning Parametric Distributions from Samples and Preferences
Marc Jourdan
Gizem Yüce
Nicolas Flammarion
211
0
0
29 May 2025
Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
Aya Kayal
Sattar Vakili
Laura Toni
Da-shan Shiu
A. Bernacchia
190
0
0
29 May 2025
A Unified Online-Offline Framework for Co-Branding Campaign Recommendations
Knowledge Discovery and Data Mining (KDD), 2025
Xiangxiang Dai
Xiaowei Sun
Jinhang Zuo
Xutong Liu
John C. S. Lui
219
1
0
28 May 2025
PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models
Xiaoyan Hu
Lauren Pick
Ho-fung Leung
Farzan Farnia
266
4
0
24 May 2025
Neural Logistic Bandits
Seoungbin Bae
Dabeen Lee
1.1K
2
0
04 May 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Conference on Learning for Dynamics & Control (L4DC), 2025
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
570
0
0
20 Apr 2025
Dynamic Assortment Selection and Pricing with Censored Preference Feedback
International Conference on Learning Representations (ICLR), 2025
Jung-hun Kim
Min-hwan Oh
225
1
0
03 Apr 2025
Language Model Personalization via Reward Factorization
Idan Shenfeld
Felix Faltings
Pulkit Agrawal
Aldo Pacchiano
373
14
0
08 Mar 2025
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
Neural Information Processing Systems (NeurIPS), 2024
Long-Fei Li
Yu Zhang
Peng Zhao
Zhi Zhou
596
10
0
17 Jan 2025
Near Optimal Pure Exploration in Logistic Bandits
Eduardo Ochoa Rivera
Ambuj Tewari
420
1
0
28 Oct 2024
Optimal Design for Reward Modeling in RLHF
Antoine Scheid
Etienne Boursier
Alain Durmus
Michael I. Jordan
Pierre Ménard
Eric Moulines
Michal Valko
OffRL
514
20
0
22 Oct 2024
Almost Free: Self-concordance in Natural Exponential Families and an Application to Bandits
Neural Information Processing Systems (NeurIPS), 2024
Shuai Liu
Alex Ayoub
Flore Sentenac
Xiaoqi Tan
Csaba Szepesvári
304
6
0
01 Oct 2024
Advances in Preference-based Reinforcement Learning: A Review
IEEE International Conference on Systems, Man and Cybernetics (SMC), 2022
Youssef Abdelkareem
Shady Shehata
Fakhri Karray
OffRL
261
18
0
21 Aug 2024
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Seongho Son
William Bankes
Sayak Ray Chowdhury
Brooks Paige
Ilija Bogunovic
460
9
0
26 Jul 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
457
6
0
24 Jul 2024
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
Junghyun Lee
Se-Young Yun
Kwang-Sung Jun
661
21
0
19 Jul 2024
Bandits with Preference Feedback: A Stackelberg Game Perspective
Barna Pásztor
Parnian Kassraie
Andreas Krause
388
5
0
24 Jun 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
338
1
0
18 May 2024
Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
Neural Information Processing Systems (NeurIPS), 2024
Joongkyu Lee
Min-hwan Oh
408
13
0
16 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Neural Information Processing Systems (NeurIPS), 2024
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
273
4
0
05 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
682
108
0
29 Apr 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
463
4
0
16 Apr 2024
Generalized Linear Bandits with Limited Adaptivity
Neural Information Processing Systems (NeurIPS), 2024
Ayush Sawarni
Nirjhar Das
Siddharth Barman
Gaurav Sinha
746
15
0
10 Apr 2024
Horizon-Free Regret for Linear Markov Decision Processes
Zihan Zhang
Jason D. Lee
Yuxin Chen
Simon S. Du
226
4
0
15 Mar 2024
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Long-Fei Li
Peng Zhao
Zhi Zhou
284
7
0
07 Mar 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
320
49
0
29 Jan 2024
Long-term Safe Reinforcement Learning with Binary Feedback
AAAI Conference on Artificial Intelligence (AAAI), 2024
Akifumi Wachi
Wataru Hashimoto
Kazumune Hashimoto
OffRL
387
6
0
08 Jan 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
433
324
0
18 Dec 2023
Time-Uniform Confidence Spheres for Means of Random Vectors
Ben Chugg
Hongjian Wang
Aaditya Ramdas
911
8
0
14 Nov 2023
Exploration via linearly perturbed loss minimisation
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
David Janz
Shuai Liu
Alex Ayoub
Csaba Szepesvári
315
12
0
13 Nov 2023
Likelihood Ratio Confidence Sets for Sequential Decision Making
N. Emmenegger
Mojmír Mutný
Andreas Krause
180
12
0
08 Nov 2023
Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Junghyun Lee
Se-Young Yun
Kwang-Sung Jun
425
22
0
28 Oct 2023
Experimental Designs for Heteroskedastic Variance
Neural Information Processing Systems (NeurIPS), 2023
Justin Weltz
Tanner Fiez
Alex Volfovsky
Eric B. Laber
Blake Mason
Houssam Nassif
Lalit P. Jain
296
9
0
06 Oct 2023
Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness
Neural Information Processing Systems (NeurIPS), 2023
Evgenii Chzhen
Christophe Giraud
Zerui Li
Jean-Michel Poggi
204
3
0
25 May 2023
Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics
Guy Tennenholtz
Martin Mladenov
Nadav Merlis
Robert L. Axtell
Craig Boutilier
275
0
0
24 May 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits
International Conference on Machine Learning (ICML), 2023
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
391
16
0
15 Mar 2023
1
2
Next
Page 1 of 2