
Title |
|---|
![]() Question and Answer Test-Train Overlap in Open-Domain Question Answering
DatasetsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020 |
![]() ConvBERT: Improving BERT with Span-based Dynamic ConvolutionNeural Information Processing Systems (NeurIPS), 2020 |
![]() Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020 |
![]() DeFormer: Decomposing Pre-trained Transformers for Faster Question
AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
![]() When BERT Plays the Lottery, All Tickets Are WinningConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 |
![]() HERO: Hierarchical Encoder for Video+Language Omni-representation
Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 |
![]() DeeBERT: Dynamic Early Exiting for Accelerating BERT InferenceAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
![]() ColBERT: Efficient and Effective Passage Search via Contextualized Late
Interaction over BERTAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2020 |
![]() The Right Tool for the Job: Matching Model and Instance ComplexitiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
![]() Training with Quantization Noise for Extreme Model CompressionInternational Conference on Learning Representations (ICLR), 2020 |
![]() XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
![]() LadaBERT: Lightweight Adaptation of BERT through Hybrid Model
CompressionInternational Conference on Computational Linguistics (COLING), 2020 |
![]() DynaBERT: Dynamic BERT with Adaptive Width and DepthNeural Information Processing Systems (NeurIPS), 2020 |
![]() Structure-Level Knowledge Distillation For Multilingual Sequence
LabelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
![]() On the Effect of Dropping Layers of Pre-trained Transformer ModelsComputer Speech and Language (CSL), 2020 |
![]() MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited DevicesAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
![]() FastBERT: a Self-distilling BERT with Adaptive Inference TimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
![]() Pre-trained Models for Natural Language Processing: A SurveyScience China Technological Sciences (Sci China Technol Sci), 2020 |
![]() TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural
Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
![]() A Primer in BERTology: What we know about how BERT worksTransactions of the Association for Computational Linguistics (TACL), 2020 |
![]() Compressing Large-Scale Transformer-Based Models: A Case Study on BERTTransactions of the Association for Computational Linguistics (TACL), 2020 |
![]() MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
of Pre-Trained TransformersNeural Information Processing Systems (NeurIPS), 2020 |
![]() Improving BERT Fine-Tuning via Self-Ensemble and Self-DistillationJournal of Computational Science and Technology (JCST), 2020 |
![]() BERT-of-Theseus: Compressing BERT by Progressive Module ReplacingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 |
![]() AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural
Architecture SearchInternational Joint Conference on Artificial Intelligence (IJCAI), 2020 |
![]() Blockwise Self-Attention for Long Document UnderstandingFindings (Findings), 2019 |