Skyformer: Remodel Self-Attention with Gaussian Kernel and Nyström
MethodNeural Information Processing Systems (NeurIPS), 2021 |
Saturated Transformers are Constant-Depth Threshold CircuitsTransactions of the Association for Computational Linguistics (TACL), 2021 |
Modeling Hierarchical Structures with Continuous Recursive Neural
NetworksInternational Conference on Machine Learning (ICML), 2021 Jishnu Ray Chowdhury Cornelia Caragea |
Staircase Attention for Recurrent Processing of SequencesNeural Information Processing Systems (NeurIPS), 2021 |
Consistent Accelerated Inference via Confident Adaptive TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 |
Transformer in TransformerNeural Information Processing Systems (NeurIPS), 2021 |
Dynamic Neural Networks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021 |
Highway Transformer: Self-Gating Enhanced Self-Attentive NetworksAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
DynaBERT: Dynamic BERT with Adaptive Width and DepthNeural Information Processing Systems (NeurIPS), 2020 |
Compressive Transformers for Long-Range Sequence ModellingInternational Conference on Learning Representations (ICLR), 2019 |
Ordered MemoryNeural Information Processing Systems (NeurIPS), 2019 |
Depth-Adaptive TransformerInternational Conference on Learning Representations (ICLR), 2019 |
ALBERT: A Lite BERT for Self-supervised Learning of Language
RepresentationsInternational Conference on Learning Representations (ICLR), 2019 |
Deep Equilibrium ModelsNeural Information Processing Systems (NeurIPS), 2019 |
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017 |
Language Modeling with Gated Convolutional NetworksInternational Conference on Machine Learning (ICML), 2016 |