ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11942
  4. Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
    SSL
    AIMat
ArXivPDFHTML

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 2,911 papers shown
Title
RobBERT: a Dutch RoBERTa-based Language Model
RobBERT: a Dutch RoBERTa-based Language Model
Pieter Delobelle
Thomas Winters
Bettina Berendt
10
232
0
17 Jan 2020
Graph-Bert: Only Attention is Needed for Learning Graph Representations
Graph-Bert: Only Attention is Needed for Learning Graph Representations
Jiawei Zhang
Haopeng Zhang
Congying Xia
Li Sun
16
295
0
15 Jan 2020
A BERT based Sentiment Analysis and Key Entity Detection Approach for
  Online Financial Texts
A BERT based Sentiment Analysis and Key Entity Detection Approach for Online Financial Texts
Lin Zhao
Lin Li
Xinhao Zheng
16
65
0
14 Jan 2020
CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark
  for Chinese
CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
Liang Xu
Yu Tong
Qianqian Dong
Yixuan Liao
Cong Yu
Yin Tian
Weitang Liu
Lu Li
Caiquan Liu
Xuanwei Zhang
25
48
0
13 Jan 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural
  Architecture Search
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Jun Huang
Wei Lin
Jingren Zhou
MQ
8
104
0
13 Jan 2020
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence
  Pre-training
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Weizhen Qi
Yu Yan
Yeyun Gong
Dayiheng Liu
Nan Duan
Jiusheng Chen
Ruofei Zhang
Ming Zhou
AI4TS
27
446
0
13 Jan 2020
Assessment Modeling: Fundamental Pre-training Tasks for Interactive
  Educational Systems
Assessment Modeling: Fundamental Pre-training Tasks for Interactive Educational Systems
Youngduck Choi
Youngnam Lee
Junghyun Cho
Jineon Baek
Dongmin Shin
...
Seewoo Lee
Youngmin Cha
Chan Bae
Byungsoo Kim
Jaewe Heo
AI4Ed
8
14
0
01 Jan 2020
Clinical XLNet: Modeling Sequential Clinical Notes and Predicting
  Prolonged Mechanical Ventilation
Clinical XLNet: Modeling Sequential Clinical Notes and Predicting Prolonged Mechanical Ventilation
Kexin Huang
Abhishek Singh
Sitong Chen
E. Moseley
Chih-ying Deng
Naomi George
C. Lindvall
59
57
0
27 Dec 2019
Is Attention All What You Need? -- An Empirical Investigation on
  Convolution-Based Active Memory and Self-Attention
Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention
Thomas D. Dowdell
Hongyu Zhang
11
4
0
27 Dec 2019
BERTje: A Dutch BERT Model
BERTje: A Dutch BERT Model
Wietse de Vries
Andreas van Cranenburgh
Arianna Bisazza
Tommaso Caselli
Gertjan van Noord
Malvina Nissim
VLM
SSeg
11
291
0
19 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
20
3
0
13 Dec 2019
FlauBERT: Unsupervised Language Model Pre-training for French
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
29
395
0
11 Dec 2019
MITAS: A Compressed Time-Domain Audio Separation Network with Parameter
  Sharing
MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing
Chao-I Tuan
Yuan-Kuei Wu
Hung-yi Lee
Yu Tsao
14
2
0
09 Dec 2019
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art
  Baseline
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
21
115
0
05 Dec 2019
Bimodal Speech Emotion Recognition Using Pre-Trained Language Models
Bimodal Speech Emotion Recognition Using Pre-Trained Language Models
Verena Heusser
Niklas Freymuth
Stefan Constantin
A. Waibel
16
25
0
29 Nov 2019
Low Rank Factorization for Compact Multi-Head Self-Attention
Low Rank Factorization for Compact Multi-Head Self-Attention
Sneha Mehta
Huzefa Rangwala
Naren Ramakrishnan
17
5
0
26 Nov 2019
Efficient Attention Mechanism for Visual Dialog that can Handle All the
  Interactions between Multiple Inputs
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
16
7
0
26 Nov 2019
Pre-Training of Deep Bidirectional Protein Sequence Representations with
  Structural Information
Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural Information
Seonwoo Min
Seunghyun Park
Siwon Kim
Hyun-Soo Choi
Byunghan Lee
Sungroh Yoon
SSL
6
62
0
25 Nov 2019
Global Greedy Dependency Parsing
Global Greedy Dependency Parsing
Z. Li
Zhao Hai
Kevin Parnow
23
31
0
20 Nov 2019
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning
  Tasks
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
Fengda Zhu
Yi Zhu
Xiaojun Chang
Xiaodan Liang
LRM
14
238
0
18 Nov 2019
Unsupervised Pre-training for Natural Language Generation: A Literature
  Review
Unsupervised Pre-training for Natural Language Generation: A Literature Review
Yuanxin Liu
Zheng Lin
SSL
AI4CE
25
3
0
13 Nov 2019
ZiMM: a deep learning model for long term and blurry relapses with
  non-clinical claims data
ZiMM: a deep learning model for long term and blurry relapses with non-clinical claims data
A. Kabeshova
Yiyang Yu
Bertrand Lukacs
Emmanuel Bacry
Stéphane Gaïffas
VLM
MedIm
17
2
0
13 Nov 2019
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language
  Representation
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
Xiaozhi Wang
Tianyu Gao
Zhaocheng Zhu
Zhengyan Zhang
Zhiyuan Liu
Juan-Zi Li
Jian Tang
13
645
0
13 Nov 2019
CamemBERT: a Tasty French Language Model
CamemBERT: a Tasty French Language Model
Louis Martin
Benjamin Muller
Pedro Ortiz Suarez
Yoann Dupont
Laurent Romary
Eric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
14
955
0
10 Nov 2019
ConveRT: Efficient and Accurate Conversational Representations from
  Transformers
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
11
196
0
09 Nov 2019
Hierarchical Graph Network for Multi-hop Question Answering
Hierarchical Graph Network for Multi-hop Question Answering
Yuwei Fang
S. Sun
Zhe Gan
R. Pillai
Shuohang Wang
Jingjing Liu
12
168
0
09 Nov 2019
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained
  Language Models
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models
Linqing Liu
Haiquan Wang
Jimmy J. Lin
R. Socher
Caiming Xiong
4
21
0
09 Nov 2019
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language
  Models through Principled Regularized Optimization
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
T. Zhao
22
557
0
08 Nov 2019
Transforming Wikipedia into Augmented Data for Query-Focused
  Summarization
Transforming Wikipedia into Augmented Data for Query-Focused Summarization
Haichao Zhu
Li Dong
Furu Wei
Bing Qin
Ting Liu
RALM
31
22
0
08 Nov 2019
Blockwise Self-Attention for Long Document Understanding
Blockwise Self-Attention for Long Document Understanding
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
11
251
0
07 Nov 2019
Deepening Hidden Representations from Pre-trained Language Models
Deepening Hidden Representations from Pre-trained Language Models
Junjie Yang
Hai Zhao
6
10
0
05 Nov 2019
BAS: An Answer Selection Method Using BERT Language Model
BAS: An Answer Selection Method Using BERT Language Model
Jamshid Mozafari
A. Fatemi
M. Nematbakhsh
13
17
0
04 Nov 2019
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
8
635
0
01 Nov 2019
A neural document language modeling framework for spoken document
  retrieval
A neural document language modeling framework for spoken document retrieval
Li-Phen Yen
Zheng-Yu Wu
Kuan-Yu Chen
3DGS
12
0
0
31 Oct 2019
Parameter Sharing Decoder Pair for Auto Composing
Parameter Sharing Decoder Pair for Auto Composing
Xu Zhao
MoE
9
0
0
31 Oct 2019
Ensembling Strategies for Answering Natural Questions
Ensembling Strategies for Answering Natural Questions
Anthony Ferritto
Lin Pan
Rishav Chakravarti
Salim Roukos
Radu Florian
J. William Murdock
Avirup Sil
ELM
8
0
0
30 Oct 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
41
30
0
29 Oct 2019
What does BERT Learn from Multiple-Choice Reading Comprehension
  Datasets?
What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
Chenglei Si
Shuohang Wang
Min-Yen Kan
Jing Jiang
29
53
0
28 Oct 2019
Mockingjay: Unsupervised Speech Representation Learning with Deep
  Bidirectional Transformer Encoders
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Andy T. Liu
Shu-Wen Yang
Po-Han Chi
Po-Chun Hsu
Hung-yi Lee
SSL
15
372
0
25 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
59
19,415
0
23 Oct 2019
Injecting Hierarchy with U-Net Transformers
Injecting Hierarchy with U-Net Transformers
David Donahue
Vladislav Lialin
Anna Rumshisky
AI4CE
11
1
0
16 Oct 2019
Structured Pruning of a BERT-based Question Answering Model
Structured Pruning of a BERT-based Question Answering Model
J. Scott McCarley
Rishav Chakravarti
Avirup Sil
13
53
0
14 Oct 2019
Structured Pruning of Large Language Models
Structured Pruning of Large Language Models
Ziheng Wang
Jeremy Wohlwend
Tao Lei
16
280
0
10 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
44
70
0
09 Oct 2019
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
221
436
0
25 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
29
444
0
25 Sep 2019
Portuguese Named Entity Recognition using BERT-CRF
Portuguese Named Entity Recognition using BERT-CRF
Fábio Souza
Rodrigo Nogueira
R. Lotufo
17
250
0
23 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
11
1,812
0
23 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,817
0
17 Sep 2019
On Identifiability in Transformers
On Identifiability in Transformers
Gino Brunner
Yang Liu
Damian Pascual
Oliver Richter
Massimiliano Ciaramita
Roger Wattenhofer
ViT
10
186
0
12 Aug 2019
Previous
123...575859
Next