ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11942
  4. Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
v1v2v3v4v5v6 (latest)

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

International Conference on Learning Representations (ICLR), 2019
26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
    SSLAIMat
ArXiv (abs)PDFHTMLGithub (3271★)

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 3,050 papers shown
A Framework for Evaluation of Machine Reading Comprehension Gold
  Standards
A Framework for Evaluation of Machine Reading Comprehension Gold StandardsInternational Conference on Language Resources and Evaluation (LREC), 2020
Viktor Schlegel
Marco Valentino
André Freitas
Goran Nenadic
Riza Batista-Navarro
155
33
0
10 Mar 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
317
121
0
05 Mar 2020
Talking-Heads Attention
Talking-Heads Attention
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
271
92
0
05 Mar 2020
jiant: A Software Toolkit for Research on General-Purpose Text
  Understanding Models
jiant: A Software Toolkit for Research on General-Purpose Text Understanding ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Yada Pruksachatkun
Philip Yeres
Haokun Liu
Jason Phang
Phu Mon Htut
Alex Jinpeng Wang
Ian Tenney
Samuel R. Bowman
SSeg
246
96
0
04 Mar 2020
AraBERT: Transformer-based Model for Arabic Language Understanding
AraBERT: Transformer-based Model for Arabic Language Understanding
Wissam Antoun
Fady Baly
Hazem M. Hajj
660
1,200
0
28 Feb 2020
UniLMv2: Pseudo-Masked Language Models for Unified Language Model
  Pre-Training
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-TrainingInternational Conference on Machine Learning (ICML), 2020
Hangbo Bao
Li Dong
Furu Wei
Wenhui Wang
Nan Yang
...
Yu Wang
Songhao Piao
Jianfeng Gao
Ming Zhou
H. Hon
AI4CE
225
419
0
28 Feb 2020
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural
  Language Processing
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Ziqing Yang
Yiming Cui
Zhipeng Chen
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
VLM
186
50
0
28 Feb 2020
On Biased Compression for Distributed Learning
On Biased Compression for Distributed LearningJournal of machine learning research (JMLR), 2020
Aleksandr Beznosikov
Samuel Horváth
Peter Richtárik
M. Safaryan
326
221
0
27 Feb 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT worksTransactions of the Association for Computational Linguistics (TACL), 2020
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
483
1,744
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERTTransactions of the Association for Computational Linguistics (TACL), 2020
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
480
213
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
310
153
0
26 Feb 2020
Multi-task Learning with Multi-head Attention for Multi-choice Reading
  Comprehension
Multi-task Learning with Multi-head Attention for Multi-choice Reading Comprehension
H. Wan
193
13
0
26 Feb 2020
KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation
  Classification
KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation ClassificationAAAI Conference on Artificial Intelligence (AAAI), 2020
Chengyu Wang
Minghui Qiu
Yanjie Liang
Xiaofeng He
VLMKELM
240
16
0
25 Feb 2020
Exploring BERT Parameter Efficiency on the Stanford Question Answering
  Dataset v2.0
Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0
Eric Hulburd
129
6
0
25 Feb 2020
Do Multi-Hop Question Answering Systems Know How to Answer the
  Single-Hop Sub-Questions?
Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Yixuan Tang
Hwee Tou Ng
A. Tung
121
38
0
23 Feb 2020
Investigating Typed Syntactic Dependencies for Targeted Sentiment
  Classification Using Graph Attention Neural Network
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network
Xuefeng Bai
Pengbo Liu
Yue Zhang
GNN
176
6
0
22 Feb 2020
Training Question Answering Models From Synthetic Data
Training Question Answering Models From Synthetic DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Raul Puri
Ryan Spring
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ELM
200
170
0
22 Feb 2020
CoLES: Contrastive Learning for Event Sequences with Self-Supervision
CoLES: Contrastive Learning for Event Sequences with Self-Supervision
Dmitrii Babaev
Ivan Kireev
Nikita Ovsov
Maria Ivanova
Gleb Gusev
Ivan Nazarov
Alexander Tuzhilin
SSLAI4TS
203
39
0
19 Feb 2020
Convergence of End-to-End Training in Deep Unsupervised Contrastive
  Learning
Convergence of End-to-End Training in Deep Unsupervised Contrastive Learning
Zixin Wen
SSL
185
3
0
17 Feb 2020
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word
  Models
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word ModelsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Sijin Yu
C.-C. Jay Kuo
219
184
0
16 Feb 2020
Towards Detection of Subjective Bias using Contextualized Word
  Embeddings
Towards Detection of Subjective Bias using Contextualized Word EmbeddingsThe Web Conference (WWW), 2020
Tanvi Dadu
Kartikey Pant
R. Mamidi
76
25
0
16 Feb 2020
Fine-Tuning Pretrained Language Models: Weight Initializations, Data
  Orders, and Early Stopping
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
294
678
0
15 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
  Efficient Retrieval
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
191
53
0
14 Feb 2020
Transformer on a Diet
Transformer on a Diet
Chenguang Wang
Zihao Ye
Aston Zhang
Zheng Zhang
Alex Smola
223
9
0
14 Feb 2020
HULK: An Energy Efficiency Benchmark Platform for Responsible Natural
  Language Processing
HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language ProcessingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Xiyou Zhou
Zhiyu Zoey Chen
Xiaoyong Jin
Wenjie Wang
199
37
0
14 Feb 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
How Much Knowledge Can You Pack Into the Parameters of a Language Model?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
576
995
0
10 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module ReplacingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
700
219
0
07 Feb 2020
Aligning the Pretraining and Finetuning Objectives of Language Models
Aligning the Pretraining and Finetuning Objectives of Language Models
Nuo Wang Pierse
Jing Lu
AI4CE
109
2
0
05 Feb 2020
Pseudo-Bidirectional Decoding for Local Sequence Transduction
Pseudo-Bidirectional Decoding for Local Sequence TransductionFindings (Findings), 2020
Wangchunshu Zhou
Tao Ge
Ke Xu
213
3
0
31 Jan 2020
Bringing Stories Alive: Generating Interactive Fiction Worlds
Bringing Stories Alive: Generating Interactive Fiction WorldsArtificial Intelligence and Interactive Digital Entertainment Conference (AIIDE), 2020
Prithviraj Ammanabrolu
W. Cheung
Dan Tu
William Broniec
Mark O. Riedl
227
55
0
28 Jan 2020
Retrospective Reader for Machine Reading Comprehension
Retrospective Reader for Machine Reading ComprehensionAAAI Conference on Artificial Intelligence (AAAI), 2020
Zhuosheng Zhang
Junjie Yang
Hai Zhao
RALM
373
237
0
27 Jan 2020
DUMA: Reading Comprehension with Transposition Thinking
DUMA: Reading Comprehension with Transposition ThinkingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Q. Hu
Hai Zhao
Xiaoguang Li
AI4CE
411
37
0
26 Jan 2020
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework
  for Natural Language Generation
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language GenerationInternational Joint Conference on Artificial Intelligence (IJCAI), 2020
Dongling Xiao
Han Zhang
Yukun Li
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
217
133
0
26 Jan 2020
BERT's output layer recognizes all hidden layers? Some Intriguing
  Phenomena and a simple way to boost BERT
BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT
Wei-Tsung Kao
Tsung-Han Wu
Po-Han Chi
Chun-Cheng Hsieh
Hung-yi Lee
SSL
146
5
0
25 Jan 2020
Multi-task self-supervised learning for Robust Speech Recognition
Multi-task self-supervised learning for Robust Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
477
303
0
25 Jan 2020
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector
  Elimination
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Saurabh Goyal
Anamitra R. Choudhury
Saurabh ManishRaje
Venkatesan T. Chakaravarthy
Yogish Sabharwal
Ashish Verma
349
19
0
24 Jan 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
1.8K
6,759
0
23 Jan 2020
Normalization of Input-output Shared Embeddings in Text Generation
  Models
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
133
0
0
22 Jan 2020
A multimodal deep learning approach for named entity recognition from
  social media
A multimodal deep learning approach for named entity recognition from social media
M. Asgari-Chenaghlu
M. Feizi-Derakhshi
Leili Farzinvash
M. Balafar
C. Motamed
282
36
0
19 Jan 2020
RobBERT: a Dutch RoBERTa-based Language Model
RobBERT: a Dutch RoBERTa-based Language ModelFindings (Findings), 2020
Pieter Delobelle
Thomas Winters
Bettina Berendt
199
263
0
17 Jan 2020
Graph-Bert: Only Attention is Needed for Learning Graph Representations
Graph-Bert: Only Attention is Needed for Learning Graph Representations
Jiawei Zhang
Haopeng Zhang
Congying Xia
Li Sun
344
360
0
15 Jan 2020
A BERT based Sentiment Analysis and Key Entity Detection Approach for
  Online Financial Texts
A BERT based Sentiment Analysis and Key Entity Detection Approach for Online Financial TextsInternational Conference on Computer Supported Cooperative Work in Design (CSCWD), 2020
Lin Zhao
Lin Li
Xinhao Zheng
197
76
0
14 Jan 2020
CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark
  for Chinese
CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
Liang Xu
Yu Tong
Qianqian Dong
Yixuan Liao
Cong Yu
Yin Tian
Weitang Liu
Lu Li
Caiquan Liu
Xuanwei Zhang
308
68
0
13 Jan 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural
  Architecture Search
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture SearchInternational Joint Conference on Artificial Intelligence (IJCAI), 2020
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Yanjie Liang
Jialin Li
Jingren Zhou
MQ
219
106
0
13 Jan 2020
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence
  Pre-training
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-trainingFindings (Findings), 2020
Weizhen Qi
Yu Yan
Yeyun Gong
Dayiheng Liu
Nan Duan
Jiusheng Chen
Ruofei Zhang
Ming Zhou
AI4TS
367
473
0
13 Jan 2020
Assessment Modeling: Fundamental Pre-training Tasks for Interactive
  Educational Systems
Assessment Modeling: Fundamental Pre-training Tasks for Interactive Educational Systems
Youngduck Choi
Youngnam Lee
Junghyun Cho
Jineon Baek
Dongmin Shin
...
Seewoo Lee
Youngmin Cha
Chan Bae
Byungsoo Kim
Jaewe Heo
AI4Ed
279
14
0
01 Jan 2020
Clinical XLNet: Modeling Sequential Clinical Notes and Predicting
  Prolonged Mechanical Ventilation
Clinical XLNet: Modeling Sequential Clinical Notes and Predicting Prolonged Mechanical VentilationClinical Natural Language Processing Workshop (ClinicalNLP), 2019
Kexin Huang
Abhishek Singh
Sitong Chen
E. Moseley
Chih-ying Deng
Naomi George
C. Lindvall
198
64
0
27 Dec 2019
Is Attention All What You Need? -- An Empirical Investigation on
  Convolution-Based Active Memory and Self-Attention
Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention
Thomas D. Dowdell
Hongyu Zhang
154
4
0
27 Dec 2019
BERTje: A Dutch BERT Model
BERTje: A Dutch BERT Model
Wietse de Vries
Andreas van Cranenburgh
Arianna Bisazza
Tommaso Caselli
Gertjan van Noord
Malvina Nissim
VLMSSeg
226
316
0
19 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
169
3
0
13 Dec 2019
Previous
123...596061
Next
Page 60 of 61
Pageof 61