ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.10964
  4. Cited By
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
v1v2v3 (latest)

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

23 April 2020
Suchin Gururangan
Ana Marasović
Swabha Swayamdipta
Kyle Lo
Iz Beltagy
Doug Downey
Noah A. Smith
    VLMAI4CECLL
ArXiv (abs)PDFHTML

Papers citing "Don't Stop Pretraining: Adapt Language Models to Domains and Tasks"

50 / 1,369 papers shown
Reference-based Weak Supervision for Answer Sentence Selection using Web
  Data
Reference-based Weak Supervision for Answer Sentence Selection using Web DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Vivek Krishnamurthy
Thuy Vu
Alessandro Moschitti
175
1
0
18 Apr 2021
On the Influence of Masking Policies in Intermediate Pre-training
On the Influence of Masking Policies in Intermediate Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Qinyuan Ye
Belinda Z. Li
Sinong Wang
Benjamin Bolte
Hao Ma
Anuj Kumar
Xiang Ren
Madian Khabsa
218
13
0
18 Apr 2021
SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
SciCo: Hierarchical Cross-Document Coreference for Scientific ConceptsConference on Automated Knowledge Base Construction (AKBC), 2021
Arie Cattan
Sophie Johnson
Daniel S. Weld
Ido Dagan
Iz Beltagy
Doug Downey
Kyle Lo
328
25
0
18 Apr 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
  Crawled Corpus
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled CorpusConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
297
557
0
18 Apr 2021
Transductive Learning for Abstractive News Summarization
Transductive Learning for Abstractive News Summarization
Arthur Bravzinskas
Mengwen Liu
Ramesh Nallapati
Sujith Ravi
Markus Dreyer
AI4TS
214
1
0
17 Apr 2021
Crossing the Conversational Chasm: A Primer on Natural Language
  Processing for Multilingual Task-Oriented Dialogue Systems
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue SystemsJournal of Artificial Intelligence Research (JAIR), 2021
E. Razumovskaia
Goran Glavaš
Olga Majewska
Edoardo Ponti
Anna Korhonen
Ivan Vulić
481
38
0
17 Apr 2021
The challenges of temporal alignment on Twitter during crises
The challenges of temporal alignment on Twitter during crisesConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Aniket Pramanick
Tilman Beck
Kevin Stowe
Iryna Gurevych
217
4
0
17 Apr 2021
Moving on from OntoNotes: Coreference Resolution Model Transfer
Moving on from OntoNotes: Coreference Resolution Model TransferConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Patrick Xia
Benjamin Van Durme
225
32
0
17 Apr 2021
Sequential Cross-Document Coreference Resolution
Sequential Cross-Document Coreference ResolutionConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Emily Allaway
Shuai Wang
Miguel Ballesteros
149
19
0
17 Apr 2021
On the Importance of Effectively Adapting Pretrained Language Models for
  Active Learning
On the Importance of Effectively Adapting Pretrained Language Models for Active LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Katerina Margatina
Loïc Barrault
Nikolaos Aletras
245
43
0
16 Apr 2021
Capturing Row and Column Semantics in Transformer Based Question
  Answering over Tables
Capturing Row and Column Semantics in Transformer Based Question Answering over TablesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Michael R. Glass
Mustafa Canim
A. Gliozzo
Saneem A. Chemmengath
Vishwajeet Kumar
Rishav Chakravarti
Avirup Sil
FeiFei Pan
Samarth Bharadwaj
Nicolas Rodolfo Fauceglia
LMTD
284
59
0
16 Apr 2021
AMMU : A Survey of Transformer-based Biomedical Pretrained Language
  Models
AMMU : A Survey of Transformer-based Biomedical Pretrained Language ModelsJournal of Biomedical Informatics (JBI), 2021
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
LM&MAMedIm
379
190
0
16 Apr 2021
What to Pre-Train on? Efficient Intermediate Task Selection
What to Pre-Train on? Efficient Intermediate Task SelectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Clifton A. Poth
Jonas Pfeiffer
Andreas Rucklé
Iryna Gurevych
246
106
0
16 Apr 2021
Temporal Adaptation of BERT and Performance on Downstream Document
  Classification: Insights from Social Media
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social MediaConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Paul Röttger
J. Pierrehumbert
203
72
0
16 Apr 2021
To Share or not to Share: Predicting Sets of Sources for Model Transfer
  Learning
To Share or not to Share: Predicting Sets of Sources for Model Transfer LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Lukas Lange
Jannik Strötgen
Heike Adel
Dietrich Klakow
158
13
0
16 Apr 2021
A Million Tweets Are Worth a Few Points: Tuning Transformers for
  Customer Service Tasks
A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Amir Hadifar
Sofie Labat
Véronique Hoste
Chris Develder
Thomas Demeester
143
6
0
16 Apr 2021
Probing Across Time: What Does RoBERTa Know and When?
Probing Across Time: What Does RoBERTa Know and When?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Leo Z. Liu
Yizhong Wang
Jungo Kasai
Hannaneh Hajishirzi
Noah A. Smith
KELM
318
96
0
16 Apr 2021
Towards Robust Neural Retrieval Models with Synthetic Pre-Training
Towards Robust Neural Retrieval Models with Synthetic Pre-Training
Revanth Reddy Gangi Reddy
Vikas Yadav
Md Arafat Sultan
M. Franz
Vittorio Castelli
Heng Ji
Avirup Sil
133
14
0
15 Apr 2021
Cross-Domain Label-Adaptive Stance Detection
Cross-Domain Label-Adaptive Stance DetectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Momchil Hardalov
Arnav Arora
Preslav Nakov
Isabelle Augenstein
281
83
0
15 Apr 2021
Pseudo Zero Pronoun Resolution Improves Zero Anaphora Resolution
Pseudo Zero Pronoun Resolution Improves Zero Anaphora ResolutionConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Ryuto Konno
Shun Kiyono
Yuichiroh Matsubayashi
Hiroki Ouchi
Kentaro Inui
146
11
0
15 Apr 2021
Multitasking Inhibits Semantic Drift
Multitasking Inhibits Semantic DriftNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Athul Paul Jacob
M. Lewis
Jacob Andreas
182
13
0
15 Apr 2021
Modeling Human Mental States with an Entity-based Narrative Graph
Modeling Human Mental States with an Entity-based Narrative GraphNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
I-Ta Lee
Maria Leonor Pacheco
Dan Goldwasser
126
6
0
14 Apr 2021
UDALM: Unsupervised Domain Adaptation through Language Modeling
UDALM: Unsupervised Domain Adaptation through Language ModelingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Constantinos F. Karouzos
Georgios Paraskevopoulos
Alexandros Potamianos
163
60
0
14 Apr 2021
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for
  Unsupervised Sentence Embedding Learning
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Kexin Wang
Nils Reimers
Iryna Gurevych
358
214
0
14 Apr 2021
Detoxifying Language Models Risks Marginalizing Minority Voices
Detoxifying Language Models Risks Marginalizing Minority VoicesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
246
136
0
13 Apr 2021
Semantic maps and metrics for science Semantic maps and metrics for
  science using deep transformer encoders
Semantic maps and metrics for science Semantic maps and metrics for science using deep transformer encoders
Brendan Chambers
James A. Evans
MedIm
175
0
0
13 Apr 2021
SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning
SpartQA: : A Textual Question Answering Benchmark for Spatial ReasoningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Roshanak Mirzaee
Hossein Rajaby Faghihi
Qiang Ning
Parisa Kordjmashidi
179
101
0
12 Apr 2021
On the Inductive Bias of Masked Language Modeling: From Statistical to
  Syntactic Dependencies
On the Inductive Bias of Masked Language Modeling: From Statistical to Syntactic DependenciesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Tianyi Zhang
Tatsunori Hashimoto
AI4CE
196
30
0
12 Apr 2021
Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual
  Neural Topic Modeling
Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic ModelingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Aaron Mueller
Mark Dredze
152
15
0
11 Apr 2021
TAPAS at SemEval-2021 Task 9: Reasoning over tables with intermediate
  pre-training
TAPAS at SemEval-2021 Task 9: Reasoning over tables with intermediate pre-trainingInternational Workshop on Semantic Evaluation (SemEval), 2021
Thomas Müller
Julian Martin Eisenschlos
Syrine Krichene
LMTD
255
15
0
02 Apr 2021
CURIE: An Iterative Querying Approach for Reasoning About Situations
CURIE: An Iterative Querying Approach for Reasoning About Situations
Dheeraj Rajagopal
Aman Madaan
Niket Tandon
Yiming Yang
Shrimai Prabhumoye
Abhilasha Ravichander
Peter Clark
Eduard H. Hovy
ReLMLRM
209
6
0
01 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
  Representation Learning
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
189
7
0
01 Apr 2021
Self-Supervised Pretraining Improves Self-Supervised Pretraining
Self-Supervised Pretraining Improves Self-Supervised PretrainingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Colorado Reed
Xiangyu Yue
Aniruddha Nrusimha
Sayna Ebrahimi
Vivek Vijaykumar
...
Shanghang Zhang
Devin Guillory
Sean L. Metzger
Kurt Keutzer
Trevor Darrell
308
124
0
23 Mar 2021
Improving and Simplifying Pattern Exploiting Training
Improving and Simplifying Pattern Exploiting TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Derek Tam
Rakesh R Menon
Joey Tianyi Zhou
Shashank Srivastava
Colin Raffel
243
154
0
22 Mar 2021
MasakhaNER: Named Entity Recognition for African Languages
MasakhaNER: Named Entity Recognition for African LanguagesTransactions of the Association for Computational Linguistics (TACL), 2021
David Ifeoluwa Adelani
Jade Z. Abbott
Graham Neubig
Daniel D'souza
Julia Kreutzer
...
T. Diop
A. Diallo
Adewale Akinfaderin
T. Marengereke
Salomey Osei
305
226
0
22 Mar 2021
AdaptSum: Towards Low-Resource Domain Adaptation for Abstractive
  Summarization
AdaptSum: Towards Low-Resource Domain Adaptation for Abstractive SummarizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Tiezheng Yu
Zihan Liu
Pascale Fung
CLL
310
83
0
21 Mar 2021
Self-Supervised Test-Time Learning for Reading Comprehension
Self-Supervised Test-Time Learning for Reading ComprehensionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Pratyay Banerjee
Tejas Gokhale
Chitta Baral
SSL
194
31
0
20 Mar 2021
Structure Inducing Pre-Training
Structure Inducing Pre-TrainingNature Machine Intelligence (Nat. Mach. Intell.), 2021
Matthew B. A. McDermott
Brendan Yap
Peter Szolovits
Marinka Zitnik
337
28
0
18 Mar 2021
Modeling the Second Player in Distributionally Robust Optimization
Modeling the Second Player in Distributionally Robust OptimizationInternational Conference on Learning Representations (ICLR), 2021
Paul Michel
Tatsunori Hashimoto
Graham Neubig
227
36
0
18 Mar 2021
Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative
  Study
Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative Study
Shaoxiong Ji
M. Holtta
Pekka Marttinen
284
82
0
11 Mar 2021
Self-supervised Text-to-SQL Learning with Header Alignment Training
Self-supervised Text-to-SQL Learning with Header Alignment Training
Donggyu Kim
Seanie Lee
SSLLMTD
117
1
0
11 Mar 2021
CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
Dan Hendrycks
Collin Burns
Anya Chen
Spencer Ball
ELMAILaw
334
252
0
10 Mar 2021
Self-supervised Regularization for Text Classification
Self-supervised Regularization for Text ClassificationTransactions of the Association for Computational Linguistics (TACL), 2021
Meng Zhou
Zechen Li
P. Xie
158
18
0
09 Mar 2021
Large Pre-trained Language Models Contain Human-like Biases of What is
  Right and Wrong to Do
Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to DoNature Machine Intelligence (Nat. Mach. Intell.), 2021
P. Schramowski
Cigdem Turan
Nico Andersen
Constantin Rothkopf
Kristian Kersting
289
358
0
08 Mar 2021
"Sharks are not the threat humans are": Argument Component Segmentation
  in School Student Essays
"Sharks are not the threat humans are": Argument Component Segmentation in School Student EssaysWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2021
Tariq Alhindi
Debanjan Ghosh
115
16
0
08 Mar 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
899
3,885
0
05 Mar 2021
OAG-BERT: Towards A Unified Backbone Language Model For Academic
  Knowledge Services
OAG-BERT: Towards A Unified Backbone Language Model For Academic Knowledge ServicesKnowledge Discovery and Data Mining (KDD), 2021
Xiao Liu
Da Yin
Jingnan Zheng
Xingjian Zhang
Peng Zhang
Hongxia Yang
Yuxiao Dong
Jie Tang
VLM
230
37
0
03 Mar 2021
Gradual Fine-Tuning for Low-Resource Domain Adaptation
Gradual Fine-Tuning for Low-Resource Domain Adaptation
Haoran Xu
Seth Ebner
M. Yarmohammadi
A. White
Benjamin Van Durme
Kenton W. Murray
CLL
168
39
0
03 Mar 2021
ToxCCIn: Toxic Content Classification with Interpretability
ToxCCIn: Toxic Content Classification with InterpretabilityWorkshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), 2021
Tong Xiang
Sean MacAvaney
Eugene Yang
Nazli Goharian
239
19
0
01 Mar 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
  Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLPTransactions of the Association for Computational Linguistics (TACL), 2021
Timo Schick
Sahana Udupa
Hinrich Schütze
694
438
0
28 Feb 2021
Previous
123...2425262728
Next