Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy
Generalization with Global and Adaptive GuidanceThe Web Conference (WWW), 2024 |
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward InferenceInternational Conference on Learning Representations (ICLR), 2024 |
A New Creative Generation Pipeline for Click-Through Rate with Stable
Diffusion ModelThe Web Conference (WWW), 2024 |
Optimizing Algorithms From Pairwise User PreferencesIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023 |
Fine-Tuning Language Models with Just Forward PassesNeural Information Processing Systems (NeurIPS), 2023 |