v1v2v3 (latest)

Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles

International Conference on Learning Representations (ICLR), 2023

7 March 2023

ArXiv (abs)PDF HTML Github

Papers citing "Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles"

20 / 20 papers shown

Instant Preference Alignment for Text-to-Image Diffusion Models

182

25 Aug 2025

Provable Reinforcement Learning from Human Feedback with an Unknown Link Function

Qining Zhang

Lei Ying

369

03 Jun 2025

ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization

Keisuke Sugiura

Hiroki Matsutani

313

08 Jan 2025

Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy Generalization with Global and Adaptive GuidanceThe Web Conference (WWW), 2024

458

01 Dec 2024

Ruppert-Polyak averaging for Stochastic Order Oracle

263

24 Nov 2024

Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward InferenceInternational Conference on Learning Representations (ICLR), 2024

Qining Zhang

Lei Ying

OffRL

564

25 Sep 2024

CoCoG-2: Controllable generation of visual stimuli for understanding human concept representation

315

20 Jul 2024

Gradient Testing and Estimation by Comparisons

Chenyi Zhang

Tongyang Li

Helin Wang

Yexin Zhang

Tongyang Li

AAML

317

19 May 2024

CoCoG: Controllable Visual Stimuli Generation based on Human Concept Representations

274

25 Apr 2024

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

Xun Wu

Shaohan Huang

Furu Wei

269

23 Apr 2024

Deep Representation Learning for Multi-functional Degradation Modeling of Community-dwelling Aging Population

346

08 Apr 2024

Accelerating Parallel Sampling of Diffusion Models

Fan Wang

501

15 Feb 2024

Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

180

09 Feb 2024

A New Creative Generation Pipeline for Click-Through Rate with Stable Diffusion ModelThe Web Conference (WWW), 2024

269

17 Jan 2024

HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

187

19 Dec 2023

Optimizing Algorithms From Pairwise User PreferencesIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023

303

08 Aug 2023

FABRIC: Personalizing Diffusion Models with Iterative Feedback

242

19 Jul 2023

Fine-Tuning Language Models with Just Forward PassesNeural Information Processing Systems (NeurIPS), 2023

695

361

27 May 2023

Prompt-Tuning Decision Transformer with Preference Ranking

Shengchao Hu

Li Shen

Ya Zhang

Dacheng Tao

OffRL

242

16 May 2023

Confidence Trigger Detection: Accelerating Real-time Tracking-by-detection Systems

617

02 Feb 2019