Towards Reward Fairness in RLHF: From a Resource Allocation PerspectiveAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
On Transforming Reinforcement Learning by Transformer: The Development
TrajectoryIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022 |
Post-processing Networks: Method for Optimizing Pipeline Task-oriented
Dialogue Systems using Reinforcement LearningSIGDIAL Conferences (SIGDIAL), 2022 |
Diaformer: Automatic Diagnosis via Symptoms Sequence GenerationAAAI Conference on Artificial Intelligence (AAAI), 2021 |