XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

23 April 2024

Papers citing "XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts"

3 / 3 papers shown

Title
Layerwise Recurrent Router for Mixture-of-Experts Zihan Qiu Zeyu Huang Shuang Cheng Yizhi Zhou Zili Wang Ivan Titov Jie Fu MoE 53 2 0 13 Aug 2024
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation Jiawei Liu Chun Xia Yuyao Wang Lingming Zhang ELM ALM 172 388 0 02 May 2023
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 3,054 0 23 Jan 2020