Title |
---|
![]() MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou |
![]() Pre-Trained Models: Past, Present and Future Xu Han Zhengyan Zhang Ning Ding Yuxian Gu Xiao Liu ...Jie Tang Ji-Rong Wen Jinhui Yuan Wayne Xin Zhao Jun Zhu |