Erasure Coded Neural Network Inference via Fisher AveragingInternational Symposium on Information Theory (ISIT), 2024 |
CUED at ProbSum 2023: Hierarchical Ensemble of Summarization ModelsWorkshop on Biomedical Natural Language Processing (BioNLP), 2023 |
Accurate Knowledge Distillation with n-best RerankingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023 |
Pseudo-Label Training and Model Inertia in Neural Machine TranslationInternational Conference on Learning Representations (ICLR), 2023 |
Leveraging Synthetic Targets for Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 |
Heterogeneous-Branch Collaborative Learning for Dialogue GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023 |
Continual Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |
One Reference Is Not Enough: Diverse Distillation with Reference
Selection for Non-Autoregressive TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022 |
Twist Decoding: Diverse Generators Guide Each OtherConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
GigaST: A 10,000-hour Pseudo Speech Translation CorpusInterspeech (Interspeech), 2022 |
Selective Knowledge Distillation for Neural Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2021 |
The Volctrans Neural Speech Translation System for IWSLT 2021International Workshop on Spoken Language Translation (IWSLT), 2021 |
Knowledge Distillation as Semiparametric InferenceInternational Conference on Learning Representations (ICLR), 2021 |
Domain Adaptation and Multi-Domain Adaptation for Neural Machine
Translation: A SurveyJournal of Artificial Intelligence Research (JAIR), 2021 |
Sampling and Filtering of Neural Machine Translation Distillation DataNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021 |
Text Simplification by TaggingWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2021 |
ALP-KD: Attention-Based Layer Projection for Knowledge DistillationAAAI Conference on Artificial Intelligence (AAAI), 2020 |
Towards Understanding Ensemble, Knowledge Distillation and
Self-Distillation in Deep LearningInternational Conference on Learning Representations (ICLR), 2020 |
DiDi's Machine Translation System for WMT2020Conference on Machine Translation (WMT), 2020 |
Weight Distillation: Transferring the Knowledge in Neural Network
ParametersAnnual Meeting of the Association for Computational Linguistics (ACL), 2020 |
Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data (TKDD), 2020 |
Cross-model Back-translated Distillation for Unsupervised Machine
TranslationInternational Conference on Machine Learning (ICML), 2020 |
Building a Multi-domain Neural Machine Translation Model using Knowledge
DistillationEuropean Conference on Artificial Intelligence (ECAI), 2020 |
Balancing Cost and Benefit with Tied-Multi TransformersWorkshop on Neural Generation and Translation (WNGT), 2020 |
Neural Machine Translation: A Review and SurveyJournal of Artificial Intelligence Research (JAIR), 2019 |
Multi-agent Learning for Neural Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019 |
Multilingual Neural Machine Translation with Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2019 |
A Stable and Effective Learning Strategy for Trainable Greedy DecodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2018 |