RETVec: Resilient and Efficient Text VectorizerNeural Information Processing Systems (NeurIPS), 2023 |
Symbolic Discovery of Optimization AlgorithmsNeural Information Processing Systems (NeurIPS), 2023 |
Efficient Attention via Control VariatesInternational Conference on Learning Representations (ICLR), 2023 |
Efficient Movie Scene Detection using State-Space TransformersComputer Vision and Pattern Recognition (CVPR), 2022 |
Cramming: Training a Language Model on a Single GPU in One DayInternational Conference on Machine Learning (ICML), 2022 |
Pretraining Without AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
Meta-Learning Fast Weight Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook
Matching PerspectiveInternational Conference on Machine Learning (ICML), 2022 |
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank AttentionIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022 |
How Much Does Attention Actually Attend? Questioning the Importance of
Attention in Pretrained TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
MogaNet: Multi-order Gated Aggregation NetworkInternational Conference on Learning Representations (ICLR), 2022 |
The Devil in Linear TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
Decoupling Features in Hierarchical Propagation for Video Object
SegmentationNeural Information Processing Systems (NeurIPS), 2022 Zongxin Yang Yi Yang |
CAB: Comprehensive Attention Benchmarking on Long Sequence ModelingInternational Conference on Machine Learning (ICML), 2022 |
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for
Efficient Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |
Mega: Moving Average Equipped Gated AttentionInternational Conference on Learning Representations (ICLR), 2022 |
Stateful Memory-Augmented Transformers for Efficient Dialogue ModelingFindings (Findings), 2022 |
QSAN: A Near-term Achievable Quantum Self-Attention NetworkIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022 |
Long Range Language Modeling via Gated State SpacesInternational Conference on Learning Representations (ICLR), 2022 |
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022 |
Simple Baselines for Image RestorationEuropean Conference on Computer Vision (ECCV), 2022 |
Block-Recurrent TransformersNeural Information Processing Systems (NeurIPS), 2022 |