ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11942
  4. Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
v1v2v3v4v5v6 (latest)

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

International Conference on Learning Representations (ICLR), 2019
26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
    SSLAIMat
ArXiv (abs)PDFHTMLGithub (3271★)

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 3,050 papers shown
FlauBERT: Unsupervised Language Model Pre-training for French
FlauBERT: Unsupervised Language Model Pre-training for FrenchInternational Conference on Language Resources and Evaluation (LREC), 2019
Hang Le
Loïc Vial
Jibril Frej
Vincent Segonne
Maximin Coavoux
Benjamin Lecouteux
A. Allauzen
Benoît Crabbé
Laurent Besacier
D. Schwab
AI4CE
350
431
0
11 Dec 2019
MITAS: A Compressed Time-Domain Audio Separation Network with Parameter
  Sharing
MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing
Chao-I Tuan
Yuan-Kuei Wu
Hung-yi Lee
Yu Tsao
97
2
0
09 Dec 2019
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art
  Baseline
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art BaselineEuropean Conference on Computer Vision (ECCV), 2019
Vishvak Murahari
Dhruv Batra
Devi Parikh
Abhishek Das
VLM
360
120
0
05 Dec 2019
Bimodal Speech Emotion Recognition Using Pre-Trained Language Models
Bimodal Speech Emotion Recognition Using Pre-Trained Language Models
Verena Heusser
Niklas Freymuth
Stefan Constantin
A. Waibel
164
27
0
29 Nov 2019
Low Rank Factorization for Compact Multi-Head Self-Attention
Low Rank Factorization for Compact Multi-Head Self-Attention
Sneha Mehta
Huzefa Rangwala
Naren Ramakrishnan
149
7
0
26 Nov 2019
Efficient Attention Mechanism for Visual Dialog that can Handle All the
  Interactions between Multiple Inputs
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
294
7
0
26 Nov 2019
Pre-Training of Deep Bidirectional Protein Sequence Representations with
  Structural Information
Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural InformationIEEE Access (IEEE Access), 2019
Seonwoo Min
Seunghyun Park
Siwon Kim
Hyun-Soo Choi
Byunghan Lee
Sungroh Yoon
SSL
337
63
0
25 Nov 2019
Global Greedy Dependency Parsing
Global Greedy Dependency ParsingAAAI Conference on Artificial Intelligence (AAAI), 2019
Z. Li
Zhao Hai
Kevin Parnow
334
32
0
20 Nov 2019
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning
  Tasks
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning TasksComputer Vision and Pattern Recognition (CVPR), 2019
Fengda Zhu
Yi Zhu
Xiaojun Chang
Xiaodan Liang
LRM
457
267
0
18 Nov 2019
Unsupervised Pre-training for Natural Language Generation: A Literature
  Review
Unsupervised Pre-training for Natural Language Generation: A Literature Review
Yuanxin Liu
Zheng Lin
SSLAI4CE
123
5
0
13 Nov 2019
ZiMM: a deep learning model for long term and blurry relapses with
  non-clinical claims data
ZiMM: a deep learning model for long term and blurry relapses with non-clinical claims data
A. Kabeshova
Yiyang Yu
Bertrand Lukacs
Emmanuel Bacry
Stéphane Gaïffas
VLMMedIm
151
2
0
13 Nov 2019
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language
  Representation
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language RepresentationTransactions of the Association for Computational Linguistics (TACL), 2019
Xiaozhi Wang
Tianyu Gao
Zhaocheng Zhu
Zhengyan Zhang
Zhiyuan Liu
Juan-Zi Li
Jian Tang
395
771
0
13 Nov 2019
CamemBERT: a Tasty French Language Model
CamemBERT: a Tasty French Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Louis Martin
Benjamin Muller
Pedro Ortiz Suarez
Yoann Dupont
Laurent Romary
Eric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
543
1,056
0
10 Nov 2019
ConveRT: Efficient and Accurate Conversational Representations from
  Transformers
ConveRT: Efficient and Accurate Conversational Representations from TransformersFindings (Findings), 2019
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
433
207
0
09 Nov 2019
Hierarchical Graph Network for Multi-hop Question Answering
Hierarchical Graph Network for Multi-hop Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Yuwei Fang
S. Sun
Zhe Gan
R. Pillai
Shuohang Wang
Jingjing Liu
441
184
0
09 Nov 2019
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained
  Language Models
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models
Linqing Liu
Haiquan Wang
Jimmy J. Lin
R. Socher
Caiming Xiong
203
23
0
09 Nov 2019
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language
  Models through Principled Regularized Optimization
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
T. Zhao
651
590
0
08 Nov 2019
Transforming Wikipedia into Augmented Data for Query-Focused
  Summarization
Transforming Wikipedia into Augmented Data for Query-Focused SummarizationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019
Haichao Zhu
Li Dong
Furu Wei
Bing Qin
Ting Liu
RALM
233
25
0
08 Nov 2019
Blockwise Self-Attention for Long Document Understanding
Blockwise Self-Attention for Long Document UnderstandingFindings (Findings), 2019
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
309
269
0
07 Nov 2019
Deepening Hidden Representations from Pre-trained Language Models
Deepening Hidden Representations from Pre-trained Language Models
Junjie Yang
Hai Zhao
128
11
0
05 Nov 2019
BAS: An Answer Selection Method Using BERT Language Model
BAS: An Answer Selection Method Using BERT Language Model
Jamshid Mozafari
A. Fatemi
M. Nematbakhsh
342
18
0
04 Nov 2019
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl DataInternational Conference on Language Resources and Evaluation (LREC), 2019
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
472
756
0
01 Nov 2019
A neural document language modeling framework for spoken document
  retrieval
A neural document language modeling framework for spoken document retrievalIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Li-Phen Yen
Zheng-Yu Wu
Kuan-Yu Chen
3DGS
106
0
0
31 Oct 2019
Parameter Sharing Decoder Pair for Auto Composing
Parameter Sharing Decoder Pair for Auto Composing
Xu Zhao
MoE
107
0
0
31 Oct 2019
Ensembling Strategies for Answering Natural Questions
Ensembling Strategies for Answering Natural Questions
Anthony Ferritto
Lin Pan
Rishav Chakravarti
Salim Roukos
Radu Florian
J. William Murdock
Avirup Sil
ELM
187
0
0
30 Oct 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and ComprehensionAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMatVLM
854
12,171
0
29 Oct 2019
What does BERT Learn from Multiple-Choice Reading Comprehension
  Datasets?
What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?
Chenglei Si
Shuohang Wang
Min-Yen Kan
Jing Jiang
157
55
0
28 Oct 2019
Mockingjay: Unsupervised Speech Representation Learning with Deep
  Bidirectional Transformer Encoders
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer EncodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Andy T. Liu
Shu-Wen Yang
Po-Han Chi
Po-Chun Hsu
Hung-yi Lee
SSL
492
393
0
25 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerJournal of machine learning research (JMLR), 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
1.6K
23,949
0
23 Oct 2019
Injecting Hierarchy with U-Net Transformers
Injecting Hierarchy with U-Net Transformers
David Donahue
Vladislav Lialin
Anna Rumshisky
AI4CE
139
2
0
16 Oct 2019
Structured Pruning of a BERT-based Question Answering Model
Structured Pruning of a BERT-based Question Answering Model
J. Scott McCarley
Rishav Chakravarti
Avirup Sil
278
54
0
14 Oct 2019
Structured Pruning of Large Language Models
Structured Pruning of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Ziheng Wang
Jeremy Wohlwend
Tao Lei
299
329
0
10 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
On the adequacy of untuned warmup for adaptive optimizationAAAI Conference on Artificial Intelligence (AAAI), 2019
Jerry Ma
Denis Yarats
292
81
0
09 Oct 2019
HuggingFace's Transformers: State-of-the-art Natural Language Processing
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Thomas Wolf
Lysandre Debut
Victor Sanh
Julien Chaumond
Clement Delangue
...
Teven Le Scao
Sylvain Gugger
Mariama Drame
Quentin Lhoest
Alexander M. Rush
AI4CE
442
3,286
0
09 Oct 2019
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
FreeLB: Enhanced Adversarial Training for Natural Language UnderstandingInternational Conference on Learning Representations (ICLR), 2019
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
686
492
0
25 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning
UNITER: UNiversal Image-TExt Representation LearningEuropean Conference on Computer Vision (ECCV), 2019
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLMOT
372
465
0
25 Sep 2019
Portuguese Named Entity Recognition using BERT-CRF
Portuguese Named Entity Recognition using BERT-CRF
Fábio Souza
Rodrigo Nogueira
R. Lotufo
275
280
0
23 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language UnderstandingFindings (Findings), 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
632
2,161
0
23 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
1.3K
2,442
0
17 Sep 2019
On Identifiability in Transformers
On Identifiability in TransformersInternational Conference on Learning Representations (ICLR), 2019
Gino Brunner
Yang Liu
Damian Pascual
Oliver Richter
Massimiliano Ciaramita
Roger Wattenhofer
ViT
331
202
0
12 Aug 2019
Semi-supervised Thai Sentence Segmentation Using Local and Distant Word
  Representations
Semi-supervised Thai Sentence Segmentation Using Local and Distant Word RepresentationsACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2019
Chanatip Saetia
Ekapol Chuangsuwanich
Tawunrat Chalothorn
P. Vateekul
242
5
0
04 Aug 2019
DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks
DeepCABAC: A Universal Compression Algorithm for Deep Neural NetworksIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2019
Simon Wiedemann
H. Kirchhoffer
Stefan Matlage
Paul Haase
Arturo Marbán
...
Ahmed Osman
D. Marpe
H. Schwarz
Thomas Wiegand
Wojciech Samek
244
107
0
27 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language UnderstandingNeural Information Processing Systems (NeurIPS), 2019
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
928
9,121
0
19 Jun 2019
Pre-Training with Whole Word Masking for Chinese BERT
Pre-Training with Whole Word Masking for Chinese BERTIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019
Yiming Cui
Wanxiang Che
Ting Liu
Bing Qin
Ziqing Yang
265
233
0
19 Jun 2019
Survey on Evaluation Methods for Dialogue Systems
Survey on Evaluation Methods for Dialogue SystemsArtificial Intelligence Review (AIR), 2019
Jan Deriu
Álvaro Rodrigo
Arantxa Otegi
Guillermo Echegoyen
S. Rosset
Eneko Agirre
Mark Cieliebak
278
322
0
10 May 2019
An Attentive Survey of Attention Models
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
444
723
0
05 Apr 2019
Recent Advances in Natural Language Inference: A Survey of Benchmarks,
  Resources, and Approaches
Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
Shane Storks
Qiaozi Gao
J. Chai
476
142
0
02 Apr 2019
Tensorized Embedding Layers for Efficient Model Compression
Tensorized Embedding Layers for Efficient Model Compression
Oleksii Hrinchuk
Valentin Khrulkov
L. Mirvakhabova
Elena Orlova
Ivan Oseledets
248
75
0
30 Jan 2019
Sentence transition matrix: An efficient approach that preserves
  sentence semantics
Sentence transition matrix: An efficient approach that preserves sentence semantics
Myeongjun Jang
Pilsung Kang
103
3
0
16 Jan 2019
Impact of Power System Partitioning on the Efficiency of Distributed
  Multi-Step Optimization
Impact of Power System Partitioning on the Efficiency of Distributed Multi-Step Optimization
Dongliang Chen
A. Bucchiarone
Zhihan Lv
133
14
0
31 May 2016
Previous
123...596061
Page 61 of 61
Pageof 61