Title |
---|
![]() ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
within Large Language Models Chenyang Song Xu Han Zhengyan Zhang Shengding Hu Xiyu Shi ...Chen Chen Zhiyuan Liu Guanglin Li Tao Yang Maosong Sun |
![]() ReLU Wins: Discovering Efficient Activation Functions for Sparse
LLMs Zhengyan Zhang Yixin Song Guanghui Yu Xu Han Yankai Lin Chaojun Xiao Chenyang Song Zhiyuan Liu Zeyu Mi Maosong Sun |
![]() Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea |
![]() OLMo: Accelerating the Science of Language Models Dirk Groeneveld Iz Beltagy Pete Walsh Akshita Bhagia Rodney Michael Kinney ...Jesse Dodge Kyle Lo Luca Soldaini Noah A. Smith Hanna Hajishirzi |
![]() The Case for Co-Designing Model Architectures with Hardware Quentin G. Anthony Jacob Hatef Deepak Narayanan Stella Biderman Stas Bekman Junqi Yin A. Shafi Hari Subramoni Dhabaleswar Panda |