Title |
---|
![]() Exploring the Benefit of Activation Sparsity in Pre-training Zhengyan Zhang Chaojun Xiao Qiujieli Qin Yankai Lin Zhiyuan Zeng Xu Han Zhiyuan Liu Ruobing Xie Maosong Sun Jie Zhou |
![]() ReLU Wins: Discovering Efficient Activation Functions for Sparse
LLMs Zhengyan Zhang Yixin Song Guanghui Yu Xu Han Yankai Lin Chaojun Xiao Chenyang Song Zhiyuan Liu Zeyu Mi Maosong Sun |