ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.03751
  4. Cited By
Zeroth-Order Optimization Meets Human Feedback: Provable Learning via
  Ranking Oracles
v1v2v3 (latest)

Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles

International Conference on Learning Representations (ICLR), 2023
7 March 2023
Zhiwei Tang
Dmitry Rybin
Tsung-Hui Chang
    ALMDiffM
ArXiv (abs)PDFHTMLGithub

Papers citing "Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles"

20 / 20 papers shown
Instant Preference Alignment for Text-to-Image Diffusion Models
Instant Preference Alignment for Text-to-Image Diffusion Models
Yan Zhao
Songlin Yang
Xiaoxuan Han
Wei Wang
Jing Dong
Yueming Lyu
Ziyu Xue
182
3
0
25 Aug 2025
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
Qining Zhang
Lei Ying
369
2
0
03 Jun 2025
ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization
ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization
Keisuke Sugiura
Hiroki Matsutani
MQ
313
3
0
08 Jan 2025
Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy
  Generalization with Global and Adaptive Guidance
Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy Generalization with Global and Adaptive GuidanceThe Web Conference (WWW), 2024
Zhe Wang
Haozhu Wang
Yanjun Qi
OffRL
458
3
0
01 Dec 2024
Ruppert-Polyak averaging for Stochastic Order Oracle
Ruppert-Polyak averaging for Stochastic Order Oracle
V. N. Smirnov
K. M. Kazistova
I. A. Sudakov
V. Leplat
A. V. Gasnikov
A. V. Lobanov
263
0
0
24 Nov 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward InferenceInternational Conference on Learning Representations (ICLR), 2024
Qining Zhang
Lei Ying
OffRL
564
11
0
25 Sep 2024
CoCoG-2: Controllable generation of visual stimuli for understanding
  human concept representation
CoCoG-2: Controllable generation of visual stimuli for understanding human concept representation
Chen Wei
Jiachen Zou
Dietmar Heinke
Quanying Liu
315
3
0
20 Jul 2024
Gradient Testing and Estimation by Comparisons
Gradient Testing and Estimation by Comparisons
Chenyi Zhang
Tongyang Li
Helin Wang
Yexin Zhang
Tongyang Li
AAML
317
4
0
19 May 2024
CoCoG: Controllable Visual Stimuli Generation based on Human Concept
  Representations
CoCoG: Controllable Visual Stimuli Generation based on Human Concept Representations
Chen Wei
Jiachen Zou
Dietmar Heinke
Quanying Liu
274
9
0
25 Apr 2024
Multimodal Large Language Model is a Human-Aligned Annotator for
  Text-to-Image Generation
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
Xun Wu
Shaohan Huang
Furu Wei
269
19
0
23 Apr 2024
Deep Representation Learning for Multi-functional Degradation Modeling
  of Community-dwelling Aging Population
Deep Representation Learning for Multi-functional Degradation Modeling of Community-dwelling Aging Population
Suiyao Chen
Xinyi Liu
Yulei Li
Jing Wu
Handong Yao
346
6
0
08 Apr 2024
Accelerating Parallel Sampling of Diffusion Models
Accelerating Parallel Sampling of Diffusion Models
Zhiwei Tang
Jiasheng Tang
Hao Luo
Fan Wang
Tsung-Hui Chang
501
28
0
15 Feb 2024
Human Aesthetic Preference-Based Large Text-to-Image Model
  Personalization: Kandinsky Generation as an Example
Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example
Aven Le Zhou
Yu-Ao Wang
Wei Wu
Kang Zhang
180
2
0
09 Feb 2024
A New Creative Generation Pipeline for Click-Through Rate with Stable
  Diffusion Model
A New Creative Generation Pipeline for Click-Through Rate with Stable Diffusion ModelThe Web Conference (WWW), 2024
Hao Yang
Jianxin Yuan
Shuai Yang
Linhe Xu
Shuo Yuan
Yifan Zeng
269
26
0
17 Jan 2024
HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models
  with Minimal Feedback
HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback
Gaoge Han
Shaoli Huang
Biwei Huang
Jinglei Tang
VGen
187
4
0
19 Dec 2023
Optimizing Algorithms From Pairwise User Preferences
Optimizing Algorithms From Pairwise User PreferencesIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
L. Keselman
Katherine Shih
M. Hebert
Aaron Steinfeld
303
8
0
08 Aug 2023
FABRIC: Personalizing Diffusion Models with Iterative Feedback
FABRIC: Personalizing Diffusion Models with Iterative Feedback
Dimitri von Rütte
Elisabetta Fedele
Jonathan Thomm
Lukas Wolf
242
24
0
19 Jul 2023
Fine-Tuning Language Models with Just Forward Passes
Fine-Tuning Language Models with Just Forward PassesNeural Information Processing Systems (NeurIPS), 2023
Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alexandru Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
695
361
0
27 May 2023
Prompt-Tuning Decision Transformer with Preference Ranking
Prompt-Tuning Decision Transformer with Preference Ranking
Shengchao Hu
Li Shen
Ya Zhang
Dacheng Tao
OffRL
242
19
0
16 May 2023
Confidence Trigger Detection: Accelerating Real-time
  Tracking-by-detection Systems
Confidence Trigger Detection: Accelerating Real-time Tracking-by-detection Systems
Zhicheng Ding
Zhixin Lai
Siyang Li
Panfeng Li
Qikai Yang
E. Wong
617
27
0
02 Feb 2019
1
Page 1 of 1