
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from
Human InputInternational Conference on Machine Learning (ICML), 2024 |
![]() Similarity-Navigated Conformal Prediction for Graph Neural NetworksNeural Information Processing Systems (NeurIPS), 2024 |
![]() One vs. Many: Comprehending Accurate Information from Multiple Erroneous
and Inconsistent AI GenerationsConference on Fairness, Accountability and Transparency (FAccT), 2024 |
![]() Semantic Objective Functions: A distribution-aware method for adding
logical constraints in deep learningInternational Conference on Agents and Artificial Intelligence (ICAART), 2024 |
![]() Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and
Beyond: A SurveyICT express (IE), 2024 |