On the Representation Collapse of Sparse Mixture of ExpertsNeural Information Processing Systems (NeurIPS), 2022 |
A Fast Post-Training Pruning Framework for TransformersNeural Information Processing Systems (NeurIPS), 2022 |
Accelerated Sparse Neural Training: A Provable and Efficient Method to
Find N:M Transposable MasksNeural Information Processing Systems (NeurIPS), 2021 |
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020 |
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural NetworksNeural Information Processing Systems (NeurIPS), 2019 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
TransformerJournal of machine learning research (JMLR), 2019 |
Outrageously Large Neural Networks: The Sparsely-Gated
Mixture-of-Experts LayerInternational Conference on Learning Representations (ICLR), 2017 |