Title |
---|
![]() Exploring the Benefit of Activation Sparsity in Pre-training Zhengyan Zhang Chaojun Xiao Qiujieli Qin Yankai Lin Zhiyuan Zeng Xu Han Zhiyuan Liu Ruobing Xie Maosong Sun Jie Zhou |
![]() ReLU Wins: Discovering Efficient Activation Functions for Sparse
LLMs Zhengyan Zhang Yixin Song Guanghui Yu Xu Han Yankai Lin Chaojun Xiao Chenyang Song Zhiyuan Liu Zeyu Mi Maosong Sun |
![]() A Comprehensive Study of Knowledge Editing for Large Language Models Ningyu Zhang Yunzhi Yao Bo Tian Peng Wang Shumin Deng ...Lei Liang Zhiqiang Zhang Xiao-Jun Zhu Jun Zhou Huajun Chen |
![]() Emergent Modularity in Pre-trained Transformers Zhengyan Zhang Zhiyuan Zeng Yankai Lin Chaojun Xiao Xiaozhi Wang Xu Han Zhiyuan Liu Ruobing Xie Maosong Sun Jie Zhou |