ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.07530
  4. Cited By
Improved Optimistic Algorithms for Logistic Bandits
v1v2 (latest)

Improved Optimistic Algorithms for Logistic Bandits

18 February 2020
Louis Faury
Marc Abeille
Clément Calauzènes
Olivier Fercoq
ArXiv (abs)PDFHTML

Papers citing "Improved Optimistic Algorithms for Logistic Bandits"

50 / 69 papers shown
Title
Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback
Efficient Algorithms for Logistic Contextual Slate Bandits with Bandit Feedback
Tanmay Goyal
Gaurav Sinha
15
0
0
16 Jun 2025
Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
Bayesian Optimization from Human Feedback: Near-Optimal Regret Bounds
Aya Kayal
Sattar Vakili
Laura Toni
Da-shan Shiu
A. Bernacchia
34
0
0
29 May 2025
Learning Parametric Distributions from Samples and Preferences
Learning Parametric Distributions from Samples and Preferences
Marc Jourdan
Gizem Yüce
Nicolas Flammarion
15
0
0
29 May 2025
A Unified Online-Offline Framework for Co-Branding Campaign Recommendations
A Unified Online-Offline Framework for Co-Branding Campaign Recommendations
Xiangxiang Dai
Xiaowei Sun
Jinhang Zuo
Xutong Liu
John C. S. Lui
31
0
0
28 May 2025
PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models
PromptWise: Online Learning for Cost-Aware Prompt Assignment in Generative Models
Xiaoyan Hu
Lauren Pick
Ho-fung Leung
Farzan Farnia
30
1
0
24 May 2025
Neural Logistic Bandits
Neural Logistic Bandits
Seoungbin Bae
Dabeen Lee
527
0
0
04 May 2025
Reinforcement Learning from Multi-level and Episodic Human Feedback
Reinforcement Learning from Multi-level and Episodic Human Feedback
Muhammad Qasim Elahi
Somtochukwu Oguchienti
Maheed H. Ahmed
Mahsa Ghasemi
OffRL
92
0
0
20 Apr 2025
Dynamic Assortment Selection and Pricing with Censored Preference Feedback
Dynamic Assortment Selection and Pricing with Censored Preference Feedback
Jung-hun Kim
Min-hwan Oh
60
0
0
03 Apr 2025
Language Model Personalization via Reward Factorization
Idan Shenfeld
Felix Faltings
Pulkit Agrawal
Aldo Pacchiano
106
1
0
08 Mar 2025
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation
Long-Fei Li
Yu Zhang
Peng Zhao
Zhi Zhou
255
5
0
17 Jan 2025
Near Optimal Pure Exploration in Logistic Bandits
Near Optimal Pure Exploration in Logistic Bandits
Eduardo Ochoa Rivera
Ambuj Tewari
94
0
0
28 Oct 2024
Optimal Design for Reward Modeling in RLHF
Optimal Design for Reward Modeling in RLHF
Antoine Scheid
Etienne Boursier
Alain Durmus
Michael I. Jordan
Pierre Ménard
Eric Moulines
Michal Valko
OffRL
148
9
0
22 Oct 2024
Almost Free: Self-concordance in Natural Exponential Families and an
  Application to Bandits
Almost Free: Self-concordance in Natural Exponential Families and an Application to Bandits
Shuai Liu
Alex Ayoub
Flore Sentenac
Xiaoqi Tan
Csaba Szepesvári
81
1
0
01 Oct 2024
Advances in Preference-based Reinforcement Learning: A Review
Advances in Preference-based Reinforcement Learning: A Review
Youssef Abdelkareem
Shady Shehata
Fakhri Karray
OffRL
96
10
0
21 Aug 2024
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Seongho Son
William Bankes
Sayak Ray Chowdhury
Brooks Paige
Ilija Bogunovic
122
4
0
26 Jul 2024
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma
Zhongxiang Dai
Xiaoqiang Lin
Patrick Jaillet
K. H. Low
187
6
0
24 Jul 2024
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
Junghyun Lee
Se-Young Yun
Kwang-Sung Jun
192
6
0
19 Jul 2024
Bandits with Preference Feedback: A Stackelberg Game Perspective
Bandits with Preference Feedback: A Stackelberg Game Perspective
Barna Pásztor
Parnian Kassraie
Andreas Krause
75
4
0
24 Jun 2024
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
138
1
0
18 May 2024
Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
Joongkyu Lee
Min-hwan Oh
93
7
0
16 May 2024
Active Preference Learning for Ordering Items In- and Out-of-sample
Active Preference Learning for Ordering Items In- and Out-of-sample
Herman Bergström
Emil Carlsson
Devdatt Dubhashi
Fredrik D. Johansson
81
0
0
05 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
155
72
0
29 Apr 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
110
1
0
16 Apr 2024
Generalized Linear Bandits with Limited Adaptivity
Generalized Linear Bandits with Limited Adaptivity
Ayush Sawarni
Nirjhar Das
Siddharth Barman
Gaurav Sinha
189
5
0
10 Apr 2024
Horizon-Free Regret for Linear Markov Decision Processes
Horizon-Free Regret for Linear Markov Decision Processes
Zihan Zhang
Jason D. Lee
Yuxin Chen
Simon S. Du
55
3
0
15 Mar 2024
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
  Feedback and Unknown Transition
Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transition
Long-Fei Li
Peng Zhao
Zhi Zhou
85
6
0
07 Mar 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
84
33
0
29 Jan 2024
Long-term Safe Reinforcement Learning with Binary Feedback
Long-term Safe Reinforcement Learning with Binary Feedback
Akifumi Wachi
Wataru Hashimoto
Kazumune Hashimoto
OffRL
88
3
0
08 Jan 2024
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
135
204
0
18 Dec 2023
Time-Uniform Confidence Spheres for Means of Random Vectors
Time-Uniform Confidence Spheres for Means of Random Vectors
Ben Chugg
Hongjian Wang
Aaditya Ramdas
203
5
0
14 Nov 2023
Exploration via linearly perturbed loss minimisation
Exploration via linearly perturbed loss minimisation
David Janz
Shuai Liu
Alex Ayoub
Csaba Szepesvári
78
6
0
13 Nov 2023
Likelihood Ratio Confidence Sets for Sequential Decision Making
Likelihood Ratio Confidence Sets for Sequential Decision Making
N. Emmenegger
Mojmír Mutný
Andreas Krause
37
10
0
08 Nov 2023
Improved Regret Bounds of (Multinomial) Logistic Bandits via
  Regret-to-Confidence-Set Conversion
Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion
Junghyun Lee
Se-Young Yun
Kwang-Sung Jun
110
13
0
28 Oct 2023
Experimental Designs for Heteroskedastic Variance
Experimental Designs for Heteroskedastic Variance
Justin Weltz
Tanner Fiez
Alex Volfovsky
Eric B. Laber
Blake Mason
Houssam Nassif
Lalit P. Jain
79
5
0
06 Oct 2023
Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with
  Application to Fairness
Small Total-Cost Constraints in Contextual Bandits with Knapsacks, with Application to Fairness
Evgenii Chzhen
Christophe Giraud
Zerui Li
Gilles Stoltz
42
1
0
25 May 2023
Ranking with Popularity Bias: User Welfare under Self-Amplification
  Dynamics
Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics
Guy Tennenholtz
Martin Mladenov
Nadav Merlis
Robert L. Axtell
Craig Boutilier
53
0
0
24 May 2023
Borda Regret Minimization for Generalized Linear Dueling Bandits
Borda Regret Minimization for Generalized Linear Dueling Bandits
Yue Wu
Tao Jin
Hao Lou
Farzad Farnoud
Quanquan Gu
94
13
0
15 Mar 2023
Revisiting Weighted Strategy for Non-stationary Parametric Bandits
Revisiting Weighted Strategy for Non-stationary Parametric Bandits
Jing Wang
Peng Zhao
Zhihong Zhou
47
6
0
05 Mar 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
132
209
0
26 Jan 2023
Overcoming Prior Misspecification in Online Learning to Rank
Overcoming Prior Misspecification in Online Learning to Rank
Javad Azizi
Ofer Meshi
M. Zoghi
Maryam Karimzadehgan
84
1
0
25 Jan 2023
Confidence Sets under Generalized Self-Concordance
Confidence Sets under Generalized Self-Concordance
Lang Liu
Zaïd Harchaoui
51
1
0
31 Dec 2022
Reinforcement Learning in Credit Scoring and Underwriting
Reinforcement Learning in Credit Scoring and Underwriting
S. Kiatsupaibul
Pakawan Chansiripas
Pojtanut Manopanjasiri
Kantapong Visantavarakul
Zheng Wen
OffRL
14
0
0
15 Dec 2022
Risk-aware linear bandits with convex loss
Risk-aware linear bandits with convex loss
Patrick Saux
Odalric-Ambrym Maillard
54
2
0
15 Sep 2022
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual
  Bandits
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits
Wonyoung Hedge Kim
Kyungbok Lee
M. Paik
95
14
0
15 Sep 2022
Delayed Feedback in Generalised Linear Bandits Revisited
Delayed Feedback in Generalised Linear Bandits Revisited
Benjamin Howson
Ciara Pike-Burke
Sarah Filippi
61
16
0
21 Jul 2022
Nearly Minimax Optimal Reinforcement Learning with Linear Function
  Approximation
Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation
Pihe Hu
Yu Chen
Longbo Huang
86
35
0
23 Jun 2022
Contextual Bandits with Knapsacks for a Conversion Model
Contextual Bandits with Knapsacks for a Conversion Model
Zerui Li
Gilles Stoltz
95
3
0
01 Jun 2022
Reinforcement Learning with a Terminator
Reinforcement Learning with a Terminator
Guy Tennenholtz
Nadav Merlis
Lior Shani
Shie Mannor
Uri Shalit
Gal Chechik
Assaf Hallak
Gal Dalal
65
5
0
30 May 2022
Lifting the Information Ratio: An Information-Theoretic Analysis of
  Thompson Sampling for Contextual Bandits
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
Gergely Neu
Julia Olkhovskaya
Matteo Papini
Ludovic Schwartz
94
16
0
27 May 2022
Fast Rates in Pool-Based Batch Active Learning
Fast Rates in Pool-Based Batch Active Learning
Claudio Gentile
Zhilei Wang
Tong Zhang
80
16
0
11 Feb 2022
12
Next