Title |
---|
![]() MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou |
![]() CPM-2: Large-scale Cost-effective Pre-trained Language Models Zhengyan Zhang Yuxian Gu Xu Han Shengqi Chen Chaojun Xiao ...Minlie Huang Wentao Han Yang Liu Xiaoyan Zhu Maosong Sun |