Title |
---|
![]() ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity
within Large Language Models Chenyang Song Xu Han Zhengyan Zhang Shengding Hu Xiyu Shi ...Chen Chen Zhiyuan Liu Guanglin Li Tao Yang Maosong Sun |
![]() Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea |
![]() Recursion in Recursion: Two-Level Nested Recursion for Length
Generalization with Scalability Jishnu Ray Chowdhury Cornelia Caragea |