Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1909.11942
Cited By
v1
v2
v3
v4
v5
v6 (latest)
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
International Conference on Learning Representations (ICLR), 2019
26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Github (3271★)
Papers citing
"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"
50 / 3,050 papers shown
A Framework for Evaluation of Machine Reading Comprehension Gold Standards
International Conference on Language Resources and Evaluation (LREC), 2020
Viktor Schlegel
Marco Valentino
André Freitas
Goran Nenadic
Riza Batista-Navarro
155
33
0
10 Mar 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
317
121
0
05 Mar 2020
Talking-Heads Attention
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
271
92
0
05 Mar 2020
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Yada Pruksachatkun
Philip Yeres
Haokun Liu
Jason Phang
Phu Mon Htut
Alex Jinpeng Wang
Ian Tenney
Samuel R. Bowman
SSeg
246
96
0
04 Mar 2020
AraBERT: Transformer-based Model for Arabic Language Understanding
Wissam Antoun
Fady Baly
Hazem M. Hajj
660
1,200
0
28 Feb 2020
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
International Conference on Machine Learning (ICML), 2020
Hangbo Bao
Li Dong
Furu Wei
Wenhui Wang
Nan Yang
...
Yu Wang
Songhao Piao
Jianfeng Gao
Ming Zhou
H. Hon
AI4CE
225
419
0
28 Feb 2020
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Ziqing Yang
Yiming Cui
Zhipeng Chen
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
VLM
186
50
0
28 Feb 2020
On Biased Compression for Distributed Learning
Journal of machine learning research (JMLR), 2020
Aleksandr Beznosikov
Samuel Horváth
Peter Richtárik
M. Safaryan
326
221
0
27 Feb 2020
A Primer in BERTology: What we know about how BERT works
Transactions of the Association for Computational Linguistics (TACL), 2020
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
483
1,744
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Transactions of the Association for Computational Linguistics (TACL), 2020
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
480
213
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
310
153
0
26 Feb 2020
Multi-task Learning with Multi-head Attention for Multi-choice Reading Comprehension
H. Wan
193
13
0
26 Feb 2020
KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation Classification
AAAI Conference on Artificial Intelligence (AAAI), 2020
Chengyu Wang
Minghui Qiu
Yanjie Liang
Xiaofeng He
VLM
KELM
240
16
0
25 Feb 2020
Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0
Eric Hulburd
129
6
0
25 Feb 2020
Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Yixuan Tang
Hwee Tou Ng
A. Tung
121
38
0
23 Feb 2020
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network
Xuefeng Bai
Pengbo Liu
Yue Zhang
GNN
176
6
0
22 Feb 2020
Training Question Answering Models From Synthetic Data
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Raul Puri
Ryan Spring
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ELM
200
170
0
22 Feb 2020
CoLES: Contrastive Learning for Event Sequences with Self-Supervision
Dmitrii Babaev
Ivan Kireev
Nikita Ovsov
Maria Ivanova
Gleb Gusev
Ivan Nazarov
Alexander Tuzhilin
SSL
AI4TS
203
39
0
19 Feb 2020
Convergence of End-to-End Training in Deep Unsupervised Contrastive Learning
Zixin Wen
SSL
185
3
0
17 Feb 2020
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Sijin Yu
C.-C. Jay Kuo
219
184
0
16 Feb 2020
Towards Detection of Subjective Bias using Contextualized Word Embeddings
The Web Conference (WWW), 2020
Tanvi Dadu
Kartikey Pant
R. Mamidi
76
25
0
16 Feb 2020
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
294
678
0
15 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
191
53
0
14 Feb 2020
Transformer on a Diet
Chenguang Wang
Zihao Ye
Aston Zhang
Zheng Zhang
Alex Smola
223
9
0
14 Feb 2020
HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Xiyou Zhou
Zhiyu Zoey Chen
Xiaoyong Jin
Wenjie Wang
199
37
0
14 Feb 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
576
995
0
10 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
700
219
0
07 Feb 2020
Aligning the Pretraining and Finetuning Objectives of Language Models
Nuo Wang Pierse
Jing Lu
AI4CE
109
2
0
05 Feb 2020
Pseudo-Bidirectional Decoding for Local Sequence Transduction
Findings (Findings), 2020
Wangchunshu Zhou
Tao Ge
Ke Xu
213
3
0
31 Jan 2020
Bringing Stories Alive: Generating Interactive Fiction Worlds
Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE), 2020
Prithviraj Ammanabrolu
W. Cheung
Dan Tu
William Broniec
Mark O. Riedl
227
55
0
28 Jan 2020
Retrospective Reader for Machine Reading Comprehension
AAAI Conference on Artificial Intelligence (AAAI), 2020
Zhuosheng Zhang
Junjie Yang
Hai Zhao
RALM
373
237
0
27 Jan 2020
DUMA: Reading Comprehension with Transposition Thinking
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Q. Hu
Hai Zhao
Xiaoguang Li
AI4CE
411
37
0
26 Jan 2020
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Dongling Xiao
Han Zhang
Yukun Li
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
217
133
0
26 Jan 2020
BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT
Wei-Tsung Kao
Tsung-Han Wu
Po-Han Chi
Chun-Cheng Hsieh
Hung-yi Lee
SSL
146
5
0
25 Jan 2020
Multi-task self-supervised learning for Robust Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
477
303
0
25 Jan 2020
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Saurabh Goyal
Anamitra R. Choudhury
Saurabh ManishRaje
Venkatesan T. Chakaravarthy
Yogish Sabharwal
Ashish Verma
349
19
0
24 Jan 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
1.8K
6,759
0
23 Jan 2020
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
133
0
0
22 Jan 2020
A multimodal deep learning approach for named entity recognition from social media
M. Asgari-Chenaghlu
M. Feizi-Derakhshi
Leili Farzinvash
M. Balafar
C. Motamed
282
36
0
19 Jan 2020
RobBERT: a Dutch RoBERTa-based Language Model
Findings (Findings), 2020
Pieter Delobelle
Thomas Winters
Bettina Berendt
199
263
0
17 Jan 2020
Graph-Bert: Only Attention is Needed for Learning Graph Representations
Jiawei Zhang
Haopeng Zhang
Congying Xia
Li Sun
344
360
0
15 Jan 2020
A BERT based Sentiment Analysis and Key Entity Detection Approach for Online Financial Texts
International Conference on Computer Supported Cooperative Work in Design (CSCWD), 2020
Lin Zhao
Lin Li
Xinhao Zheng
197
76
0
14 Jan 2020
CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
Liang Xu
Yu Tong
Qianqian Dong
Yixuan Liao
Cong Yu
Yin Tian
Weitang Liu
Lu Li
Caiquan Liu
Xuanwei Zhang
308
68
0
13 Jan 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Yanjie Liang
Jialin Li
Jingren Zhou
MQ
219
106
0
13 Jan 2020
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Findings (Findings), 2020
Weizhen Qi
Yu Yan
Yeyun Gong
Dayiheng Liu
Nan Duan
Jiusheng Chen
Ruofei Zhang
Ming Zhou
AI4TS
367
473
0
13 Jan 2020
Assessment Modeling: Fundamental Pre-training Tasks for Interactive Educational Systems
Youngduck Choi
Youngnam Lee
Junghyun Cho
Jineon Baek
Dongmin Shin
...
Seewoo Lee
Youngmin Cha
Chan Bae
Byungsoo Kim
Jaewe Heo
AI4Ed
279
14
0
01 Jan 2020
Clinical XLNet: Modeling Sequential Clinical Notes and Predicting Prolonged Mechanical Ventilation
Clinical Natural Language Processing Workshop (ClinicalNLP), 2019
Kexin Huang
Abhishek Singh
Sitong Chen
E. Moseley
Chih-ying Deng
Naomi George
C. Lindvall
198
64
0
27 Dec 2019
Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention
Thomas D. Dowdell
Hongyu Zhang
154
4
0
27 Dec 2019
BERTje: A Dutch BERT Model
Wietse de Vries
Andreas van Cranenburgh
Arianna Bisazza
Tommaso Caselli
Gertjan van Noord
Malvina Nissim
VLM
SSeg
226
316
0
19 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
169
3
0
13 Dec 2019
Previous
1
2
3
...
59
60
61
Next
Page 60 of 61
Page
of 61
Go