ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback

25 June 2024

Papers citing "ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback"

6 / 6 papers shown

Title
Improving LLM-as-a-Judge Inference with the Judgment Distribution Victor Wang Michael J.Q. Zhang Eunsol Choi 49 0 0 04 Mar 2025
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 220 495 0 28 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering Pan Lu Swaroop Mishra Tony Xia Liang Qiu Kai-Wei Chang Song-Chun Zhu Oyvind Tafjord Peter Clark A. Kalyan ELM ReLM LRM 198 1,089 0 20 Sep 2022
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 291 2,712 0 24 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models Alex Tamkin Miles Brundage Jack Clark Deep Ganguli AILaw ELM 192 248 0 04 Feb 2021