Train Short, Test Long: Attention with Linear Biases Enables Input
Length ExtrapolationInternational Conference on Learning Representations (ICLR), 2021 |
LoRA: Low-Rank Adaptation of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2021 |
Big Bird: Transformers for Longer SequencesNeural Information Processing Systems (NeurIPS), 2020 |
Compressive Transformers for Long-Range Sequence ModellingInternational Conference on Learning Representations (ICLR), 2019 |
Sentence-BERT: Sentence Embeddings using Siamese BERT-NetworksConference on Empirical Methods in Natural Language Processing (EMNLP), 2019 |
Neural Machine Translation of Rare Words with Subword UnitsAnnual Meeting of the Association for Computational Linguistics (ACL), 2015 |
Adam: A Method for Stochastic OptimizationInternational Conference on Learning Representations (ICLR), 2014 Diederik P. Kingma Jimmy Ba |