ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11942
  4. Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
v1v2v3v4v5v6 (latest)

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

International Conference on Learning Representations (ICLR), 2019
26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
    SSLAIMat
ArXiv (abs)PDFHTMLGithub (3271★)

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 3,047 papers shown
Title
Attention is Not Only a Weight: Analyzing Transformers with Vector Norms
Attention is Not Only a Weight: Analyzing Transformers with Vector Norms
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
183
15
0
21 Apr 2020
A Generic Network Compression Framework for Sequential Recommender
  Systems
A Generic Network Compression Framework for Sequential Recommender Systems
Yang Sun
Fajie Yuan
Ming Yang
Guoao Wei
Zhou Zhao
Duo Liu
203
58
0
21 Apr 2020
Investigating the Effectiveness of Representations Based on Pretrained
  Transformer-based Language Models in Active Learning for Labelling Text
  Datasets
Investigating the Effectiveness of Representations Based on Pretrained Transformer-based Language Models in Active Learning for Labelling Text Datasets
Jinghui Lu
B. MacNamee
108
19
0
21 Apr 2020
Fine-tuning Multi-hop Question Answering with Hierarchical Graph Network
Guanming Xiong
364
0
0
20 Apr 2020
The Cost of Training NLP Models: A Concise Overview
The Cost of Training NLP Models: A Concise Overview
Or Sharir
Barak Peleg
Y. Shoham
219
230
0
19 Apr 2020
ETC: Encoding Long and Structured Inputs in Transformers
ETC: Encoding Long and Structured Inputs in Transformers
Joshua Ainslie
Santiago Ontanon
Chris Alberti
Vaclav Cvicek
Zachary Kenneth Fisher
Philip Pham
Anirudh Ravula
Sumit Sanghai
Qifan Wang
Li Yang
291
56
0
17 Apr 2020
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Highway Transformer: Self-Gating Enhanced Self-Attentive NetworksAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Yekun Chai
Jin Shuo
Xinwen Hou
252
22
0
17 Apr 2020
Fast and Accurate Deep Bidirectional Language Representations for
  Unsupervised Learning
Fast and Accurate Deep Bidirectional Language Representations for Unsupervised LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Joongbo Shin
Yoonhyung Lee
Seunghyun Yoon
Kyomin Jung
OOD
141
12
0
17 Apr 2020
Transform and Tell: Entity-Aware News Image Captioning
Transform and Tell: Entity-Aware News Image CaptioningComputer Vision and Pattern Recognition (CVPR), 2020
Alasdair Tran
A. Mathews
Lexing Xie
VLM
173
108
0
17 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model CompressionInternational Conference on Learning Representations (ICLR), 2020
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Edouard Grave
Armand Joulin
MQ
255
256
0
15 Apr 2020
lamBERT: Language and Action Learning Using Multimodal BERT
lamBERT: Language and Action Learning Using Multimodal BERT
Kazuki Miyazawa
Tatsuya Aoki
Takato Horii
Takayuki Nagai
SSLLM&Ro
164
12
0
15 Apr 2020
TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented
  Dialogue
TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented DialogueConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Chien-Sheng Wu
Guosheng Lin
R. Socher
Caiming Xiong
327
337
0
15 Apr 2020
Cascade Neural Ensemble for Identifying Scientifically Sound Articles
Cascade Neural Ensemble for Identifying Scientifically Sound Articles
Ashwin Karthik Ambalavanan
M. Devarakonda
91
1
0
13 Apr 2020
Robustly Pre-trained Neural Model for Direct Temporal Relation
  Extraction
Robustly Pre-trained Neural Model for Direct Temporal Relation ExtractionIEEE International Conference on Healthcare Informatics (ICHI), 2020
Hong Guan
Jianfu Li
Hua Xu
M. Devarakonda
101
13
0
13 Apr 2020
Pretrained Transformers Improve Out-of-Distribution Robustness
Pretrained Transformers Improve Out-of-Distribution RobustnessAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Dan Hendrycks
Xiaoyuan Liu
Eric Wallace
Adam Dziedzic
R. Krishnan
Basel Alomair
OOD
453
459
0
13 Apr 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
CLUE: A Chinese Language Understanding Evaluation BenchmarkInternational Conference on Computational Linguistics (COLING), 2020
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
308
429
0
13 Apr 2020
Explaining Question Answering Models through Text Generation
Explaining Question Answering Models through Text Generation
Veronica Latcinnik
Jonathan Berant
LRM
220
53
0
12 Apr 2020
Multimodal Categorization of Crisis Events in Social Media
Multimodal Categorization of Crisis Events in Social MediaComputer Vision and Pattern Recognition (CVPR), 2020
Mahdi Abavisani
Liwei Wu
Shengli Hu
Joel R. Tetreault
A. Jaimes
252
110
0
10 Apr 2020
Designing Precise and Robust Dialogue Response Evaluators
Designing Precise and Robust Dialogue Response EvaluatorsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Tianyu Zhao
Divesh Lala
Tatsuya Kawahara
145
55
0
10 Apr 2020
Telling BERT's full story: from Local Attention to Global Aggregation
Telling BERT's full story: from Local Attention to Global AggregationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2020
Damian Pascual
Gino Brunner
Roger Wattenhofer
183
20
0
10 Apr 2020
Injecting Numerical Reasoning Skills into Language Models
Injecting Numerical Reasoning Skills into Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Mor Geva
Ankit Gupta
Jonathan Berant
AIMatLRM
238
237
0
09 Apr 2020
Generating Counter Narratives against Online Hate Speech: Data and
  Strategies
Generating Counter Narratives against Online Hate Speech: Data and StrategiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Serra Sinem Tekiroğlu
Yi-Ling Chung
Marco Guerini
117
123
0
08 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and DepthNeural Information Processing Systems (NeurIPS), 2020
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
242
352
0
08 Apr 2020
Analyzing Redundancy in Pretrained Transformer Models
Analyzing Redundancy in Pretrained Transformer Models
Fahim Dalvi
Hassan Sajjad
Nadir Durrani
Yonatan Belinkov
156
3
0
08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models
On the Effect of Dropping Layers of Pre-trained Transformer ModelsComputer Speech and Language (CSL), 2020
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
263
172
0
08 Apr 2020
DialBERT: A Hierarchical Pre-Trained Model for Conversation
  Disentanglement
DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement
Tianda Li
Jia-Chen Gu
Xiao-Dan Zhu
Quan Liu
Zhenhua Ling
Zhiming Su
Si Wei
164
30
0
08 Apr 2020
Towards Evaluating the Robustness of Chinese BERT Classifiers
Towards Evaluating the Robustness of Chinese BERT Classifiers
Wei Ping
Boyuan Pan
Xin Li
Yue Liu
AAML
142
9
0
07 Apr 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Byte Pair Encoding is Suboptimal for Language Model PretrainingFindings (Findings), 2020
Kaj Bostrom
Greg Durrett
215
255
0
07 Apr 2020
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for
  Span-based Question Answering
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Changmao Li
Jinho Choi
164
26
0
07 Apr 2020
A Few Topical Tweets are Enough for Effective User-Level Stance
  Detection
A Few Topical Tweets are Enough for Effective User-Level Stance Detection
Younes Samih
Kareem Darwish
123
7
0
07 Apr 2020
Deep Learning Based Text Classification: A Comprehensive Review
Deep Learning Based Text Classification: A Comprehensive ReviewACM Computing Surveys (ACM CSUR), 2020
Shervin Minaee
Nal Kalchbrenner
Xiaoshi Zhong
Narjes Nikzad
M. Asgari-Chenaghlu
Jianfeng Gao
AILawVLMAI4TS
265
1,214
0
06 Apr 2020
Continual Domain-Tuning for Pretrained Language Models
Continual Domain-Tuning for Pretrained Language Models
Subendhu Rongali
Abhyuday N. Jagannatha
Bhanu Pratap Singh Rawat
Hong-ye Yu
CLLKELM
143
7
0
05 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference TimeAnnual Meeting of the Association for Computational Linguistics (ACL), 2020
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
228
392
0
05 Apr 2020
Finding Black Cat in a Coal Cellar -- Keyphrase Extraction &
  Keyphrase-Rubric Relationship Classification from Complex Assignments
Finding Black Cat in a Coal Cellar -- Keyphrase Extraction & Keyphrase-Rubric Relationship Classification from Complex Assignments
Manikandan Ravikiran
183
0
0
03 Apr 2020
Gestalt: a Stacking Ensemble for SQuAD2.0
Gestalt: a Stacking Ensemble for SQuAD2.0
Mohamed El-Geish
93
5
0
02 Apr 2020
Deep Entity Matching with Pre-Trained Language Models
Deep Entity Matching with Pre-Trained Language ModelsProceedings of the VLDB Endowment (PVLDB), 2020
Yuliang Li
Jinfeng Li
Yoshihiko Suhara
A. Doan
W. Tan
VLM
286
441
0
01 Apr 2020
Information Leakage in Embedding Models
Information Leakage in Embedding ModelsConference on Computer and Communications Security (CCS), 2020
Congzheng Song
A. Raghunathan
MIACV
385
320
0
31 Mar 2020
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Meta Fine-Tuning Neural Language Models for Multi-Domain Text MiningConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Chengyu Wang
Minghui Qiu
Yanjie Liang
Xiaofeng He
AI4CE
214
24
0
29 Mar 2020
Felix: Flexible Text Editing Through Tagging and Insertion
Felix: Flexible Text Editing Through Tagging and InsertionFindings (Findings), 2020
Jonathan Mallinson
Aliaksei Severyn
Eric Malmi
Guillermo Garrido
165
81
0
24 Mar 2020
Data-driven models and computational tools for neurolinguistics: a
  language technology perspective
Data-driven models and computational tools for neurolinguistics: a language technology perspective
Ekaterina Artemova
Amir Bakarov
A. Artemov
Evgeny Burnaev
M. Sharaev
116
4
0
23 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A SurveyScience China Technological Sciences (Sci China Technol Sci), 2020
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MAVLM
965
1,609
0
18 Mar 2020
Calibration of Pre-trained Transformers
Calibration of Pre-trained TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Shrey Desai
Greg Durrett
UQLM
577
353
0
17 Mar 2020
A Survey on Contextual Embeddings
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
433
169
0
16 Mar 2020
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language
  Understanding
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang
Peng Xu
Davis Liang
Ajay K. Mishra
Bing Xiang
149
33
0
16 Mar 2020
A Survey of End-to-End Driving: Architectures and Training Methods
A Survey of End-to-End Driving: Architectures and Training MethodsIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020
Ardi Tampuu
Maksym Semikin
Naveed Muhammad
D. Fishman
Tambet Matiisen
3DV
324
276
0
13 Mar 2020
Learning to Encode Position for Transformer with Continuous Dynamical
  Model
Learning to Encode Position for Transformer with Continuous Dynamical ModelInternational Conference on Machine Learning (ICML), 2020
Xuanqing Liu
Hsiang-Fu Yu
Inderjit Dhillon
Cho-Jui Hsieh
169
131
0
13 Mar 2020
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual
  Lexical Semantic Similarity
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic SimilarityComputational Linguistics (CL), 2020
Ivan Vulić
Simon Baker
Edoardo Ponti
Ulla Petti
Ira Leviant
...
Eden Bar
Matt Malone
Thierry Poibeau
Roi Reichart
Anna Korhonen
196
87
0
10 Mar 2020
A Framework for Evaluation of Machine Reading Comprehension Gold
  Standards
A Framework for Evaluation of Machine Reading Comprehension Gold StandardsInternational Conference on Language Resources and Evaluation (LREC), 2020
Viktor Schlegel
Marco Valentino
André Freitas
Goran Nenadic
Riza Batista-Navarro
134
34
0
10 Mar 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
282
119
0
05 Mar 2020
Talking-Heads Attention
Talking-Heads Attention
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
247
91
0
05 Mar 2020
Previous
123...58596061
Next