HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Modality-Fair Preference Optimization for Trustworthy MLLM AlignmentInternational Joint Conference on Artificial Intelligence (IJCAI), 2024 |
Reward Engineering for Generating Semi-structured ExplanationFindings (Findings), 2023 |
Leftover Lunch: Advantage-based Offline Reinforcement Learning for
Language ModelsInternational Conference on Learning Representations (ICLR), 2023 |