Attention-Guided Answer Distillation for Machine Reading Comprehension

Attention-Guided Answer Distillation for Machine Reading Comprehension

23 August 2018

Dongsheng Li

Papers citing "Attention-Guided Answer Distillation for Machine Reading Comprehension"

10 / 10 papers shown

Title
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models Aviv Bick Kevin Y. Li Eric P. Xing J. Zico Kolter Albert Gu Mamba 48 24 0 19 Aug 2024
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages Alireza Mohammadshahi Vassilina Nikoulina Alexandre Berard Caroline Brun James Henderson Laurent Besacier VLM MoE LRM 21 20 0 20 Oct 2022
Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing? Keshigeyan Chandrasegaran Ngoc-Trung Tran Yunqing Zhao Ngai-man Cheung 83 41 0 29 Jun 2022
Knowledge Distillation as Semiparametric Inference Tri Dao G. Kamath Vasilis Syrgkanis Lester W. Mackey 22 31 0 20 Apr 2021
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Wenhui Wang Hangbo Bao Shaohan Huang Li Dong Furu Wei MQ 19 255 0 31 Dec 2020
Knowledge Distillation: A Survey Jianping Gou B. Yu Stephen J. Maybank Dacheng Tao VLM 19 2,835 0 09 Jun 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang Ming Zhou VLM 45 1,198 0 25 Feb 2020
A Survey on Machine Reading Comprehension Systems Razieh Baradaran Razieh Ghiasi Hossein Amirkhani FaML 11 85 0 06 Jan 2020
Hint-Based Training for Non-Autoregressive Machine Translation Zhuohan Li Zi Lin Di He Fei Tian Tao Qin Liwei Wang Tie-Yan Liu 23 72 0 15 Sep 2019
Efficient Video Classification Using Fewer Frames S. Bhardwaj Mukundhan Srinivasan Mitesh M. Khapra 35 88 0 27 Feb 2019