ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.10964
  4. Cited By
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
v1v2v3 (latest)

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

23 April 2020
Suchin Gururangan
Ana Marasović
Swabha Swayamdipta
Kyle Lo
Iz Beltagy
Doug Downey
Noah A. Smith
    VLMAI4CECLL
ArXiv (abs)PDFHTML

Papers citing "Don't Stop Pretraining: Adapt Language Models to Domains and Tasks"

50 / 1,369 papers shown
Harnessing Diversity for Important Data Selection in Pretraining Large
  Language Models
Harnessing Diversity for Important Data Selection in Pretraining Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Chi Zhang
Huaping Zhong
Kuan Zhang
Chengliang Chai
Rui Wang
...
Lei Cao
Ju Fan
Ye Yuan
Guoren Wang
Conghui He
TDI
252
28
0
25 Sep 2024
Decoding Large-Language Models: A Systematic Overview of Socio-Technical
  Impacts, Constraints, and Emerging Questions
Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions
Zeyneb N. Kaya
Souvick Ghosh
129
0
0
25 Sep 2024
OSINT Clinic: Co-designing AI-Augmented Collaborative OSINT
  Investigations for Vulnerability Assessment
OSINT Clinic: Co-designing AI-Augmented Collaborative OSINT Investigations for Vulnerability AssessmentInternational Conference on Human Factors in Computing Systems (CHI), 2024
Anirban Mukhopadhyay
Kurt Luther
238
3
0
18 Sep 2024
MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid
  via Edge LLM
MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM
Sijie Ji
Xinzhe Zheng
Jiawei Sun
Renqi Chen
Wei Gao
Mani Srivastava
AI4MH
244
10
0
16 Sep 2024
Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for
  Fine-grained Text Evaluations
Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations
Abe Bohan Hou
William Jurayj
Nils Holzenberger
Andrew Blair-Stanek
Benjamin Van Durme
ELM
252
2
0
16 Sep 2024
Towards understanding evolution of science through language model series
Towards understanding evolution of science through language model series
Junjie Dong
Zhuoqi Lyu
Qing Ke
AI4TS
409
1
0
15 Sep 2024
DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and
  URLs Detection and Classification
DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification
Abdelkader El Mahdaouy
Salima Lamsiyah
Meryem Janati Idrissi
H. Alami
Zakaria Yartaoui
Ismail Berrada
142
9
0
13 Sep 2024
Self-Masking Networks for Unsupervised Adaptation
Self-Masking Networks for Unsupervised AdaptationGerman Conference on Pattern Recognition (DAGM), 2024
Alfonso Taboada Warmerdam
Mathilde Caron
Yuki M. Asano
305
2
0
11 Sep 2024
Synthetic continued pretraining
Synthetic continued pretrainingInternational Conference on Learning Representations (ICLR), 2024
Zitong Yang
Neil Band
Shuangping Li
Emmanuel Candès
Tatsunori Hashimoto
CLLSyDa
355
35
0
11 Sep 2024
A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio
A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture RatioPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2024
Ningyuan Xi
Yetao Wu
Kun Fan
Teng Chen
Qingqing Gu
Peng Yu
ALM
191
0
0
10 Sep 2024
A Comparative Study of Pre-training and Self-training
A Comparative Study of Pre-training and Self-training
Yiheng Wang
Jiayu Lin
Zuoquan Lin
SSL
323
1
0
04 Sep 2024
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models
LUK: Empowering Log Understanding with Expert Knowledge from Large Language ModelsIEEE Transactions on Software Engineering (TSE), 2024
Lipeng Ma
Weidong Yang
Sihang Jiang
Ben Fei
Mingjie Zhou
Shuhao Li
Bo Xu
Bo Xu
Yanghua Xiao
397
4
0
03 Sep 2024
Pre-Trained Language Models for Keyphrase Prediction: A Review
Pre-Trained Language Models for Keyphrase Prediction: A ReviewICT express (IE), 2024
Muhammad Umair
Tangina Sultana
Young-Koo Lee
313
8
0
02 Sep 2024
From Prediction to Application: Language Model-based Code Knowledge
  Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with
  Pedagogical Prompting for Comprehensive Programming Education
From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education
Unggi Lee
Jiyeong Bae
Yeonji Jung
Minji Kang
Gyuri Byun
...
Sookbun Lee
Jaekwon Park
Taekyung Ahn
Gunho Lee
Hyeoncheol Kim
AI4EdKELM
249
2
0
31 Aug 2024
Nexus: Specialization meets Adaptability for Efficiently Training
  Mixture of Experts
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Nikolas Gritsch
Qizhen Zhang
Acyr Locatelli
Sara Hooker
Ahmet Üstün
MoE
213
7
0
28 Aug 2024
Language Adaptation on a Tight Academic Compute Budget: Tokenizer
  Swapping Works and Pure bfloat16 Is Enough
Language Adaptation on a Tight Academic Compute Budget: Tokenizer Swapping Works and Pure bfloat16 Is Enough
Konstantin Dobler
Gerard de Melo
204
4
0
28 Aug 2024
Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for
  Long-Tailed Continual Learning
Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual LearningACM Multimedia (MM), 2024
Lei Liu
Li Liu
Yawen Cui
CLL
222
1
0
27 Aug 2024
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical
  Researcher
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical ResearcherItalian National Conference on Sensors (INS), 2024
Derry Pratama
Naufal Suryanto
Andro Aprila Adiputra
Thi-Thu-Huong Le
Ahmada Yusril Kadiptya
Muhammad Iqbal
Howon Kim
210
18
0
21 Aug 2024
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion
  for Efficient Inference Intervention in Large Language Model
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Chenhan Yuan
Fei Huang
Ru Peng
Keming Lu
Bowen Yu
Chang Zhou
Jingren Zhou
KELM
217
0
0
20 Aug 2024
Summarizing long regulatory documents with a multi-step pipeline
Summarizing long regulatory documents with a multi-step pipeline
Mika Sie
Ruby Beek
Michiel Bots
S. Brinkkemper
Albert Gatt
AILawELM
180
5
0
19 Aug 2024
NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models
NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models
Cheng Lin
Lujun Li
Dezhi Li
Jie Zou
Wei Xue
Yike Guo
AI4TS
265
14
0
18 Aug 2024
Diffusion Guided Language Modeling
Diffusion Guided Language ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Justin Lovelace
Varsha Kishore
Yiwei Chen
Kilian Q. Weinberger
294
10
0
08 Aug 2024
Automated Review Generation Method Based on Large Language Models
Automated Review Generation Method Based on Large Language ModelsNational Science Review (NSR), 2024
Shican Wu
Xiao Ma
Dehui Luo
Lulu Li
Xiangcheng Shi
...
Ran Luo
Chunlei Pei
Zhijian Zhao
Zhi-Jian Zhao
Jinlong Gong
558
9
0
30 Jul 2024
Do LLMs Really Adapt to Domains? An Ontology Learning Perspective
Do LLMs Really Adapt to Domains? An Ontology Learning PerspectiveInternational Workshop on the Semantic Web (SW), 2024
Huu Tan Mai
Cuong Xuan Chu
Heiko Paulheim
173
23
0
29 Jul 2024
Knowledge Graph Structure as Prompt: Improving Small Language Models
  Capabilities for Knowledge-based Causal Discovery
Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery
Yuni Susanti
Michael Färber
224
9
0
26 Jul 2024
ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large
  Language Model
ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model
Ning Xu
Zhaoyang Zhang
Lei Qi
Wensuo Wang
Chao Zhang
...
Mengyao Zhao
Junbo Liu
Yufan Song
Xin Geng
Jun Yang
125
3
0
26 Jul 2024
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual
  Pre-training of Language Models
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models
Jiawei Gu
Zacc Yang
Chuanghao Ding
Rui Zhao
Fei Tan
CLL
309
17
0
24 Jul 2024
A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in
  Text Classification Tasks
A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks
F. Belém
Washington Cunha
Celso França
Claudio Andrade
Leonardo Rocha
M. A. Gonçalves
138
3
0
24 Jul 2024
Towards Aligning Language Models with Textual Feedback
Towards Aligning Language Models with Textual Feedback
Sauc Abadal Lloret
Shehzaad Dhuliawala
K. Murugesan
Mrinmaya Sachan
VLM
374
1
0
24 Jul 2024
Structure-aware Domain Knowledge Injection for Large Language Models
Structure-aware Domain Knowledge Injection for Large Language Models
Kai-Chun Liu
Ze Chen
Zhihang Fu
Rongxin Jiang
Fan Zhou
Yao-Shen Chen
Yue-bo Wu
Yue Wu
Jieping Ye
178
0
0
23 Jul 2024
Domain-Specific Pretraining of Language Models: A Comparative Study in
  the Medical Field
Domain-Specific Pretraining of Language Models: A Comparative Study in the Medical Field
Tobias Kerner
ELMLM&MA
306
6
0
19 Jul 2024
ChipXplore: Natural Language Exploration of Hardware Designs and Libraries
ChipXplore: Natural Language Exploration of Hardware Designs and Libraries
Manar Abdelatty
Sherief Reda
Sherief Reda
257
0
0
17 Jul 2024
On Large Language Model Continual Unlearning
On Large Language Model Continual Unlearning
Chongyang Gao
Lixu Wang
Chenkai Weng
Tianlin Li
Qi Zhu
Qi Zhu
MU
271
0
0
14 Jul 2024
The Sociolinguistic Foundations of Language Modeling
The Sociolinguistic Foundations of Language Modeling
Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
A. Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter
309
26
0
12 Jul 2024
Grounding and Evaluation for Large Language Models: Practical Challenges
  and Lessons Learned (Survey)
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)
K. Kenthapadi
M. Sameki
Ankur Taly
HILMELMAILaw
220
33
0
10 Jul 2024
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language
  Models
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models
Jupinder Parmar
Sanjev Satheesh
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
286
53
0
09 Jul 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELMALM
405
13
0
08 Jul 2024
BadCLM: Backdoor Attack in Clinical Language Models for Electronic
  Health Records
BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records
Weimin Lyu
Zexin Bi
Fusheng Wang
Chao Chen
253
10
0
06 Jul 2024
Using LLMs to label medical papers according to the CIViC evidence model
Using LLMs to label medical papers according to the CIViC evidence model
Markus Hisch
Xing David Wang
196
0
0
05 Jul 2024
Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Yixiang Chen
Yaoxian Song
Xinglin Pan
Peijie Dong
Xiaofei Yang
Qiang-qiang Wang
Zhixu Li
Tiefeng Li
Xiaowen Chu
286
2
0
03 Jul 2024
Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on Social
  Media
Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on Social Media
Sidney G. -J. Wong
149
0
0
01 Jul 2024
M2QA: Multi-domain Multilingual Question Answering
M2QA: Multi-domain Multilingual Question Answering
Leon Arne Engländer
Hannah Sterz
Clifton A. Poth
Jonas Pfeiffer
Ilia Kuznetsov
Iryna Gurevych
VLM
254
5
0
01 Jul 2024
Locate&Edit: Energy-based Text Editing for Efficient, Flexible, and
  Faithful Controlled Text Generation
Locate&Edit: Energy-based Text Editing for Efficient, Flexible, and Faithful Controlled Text Generation
Hye Ryung Son
Jay-Yoon Lee
168
3
0
30 Jun 2024
KPC-cF: Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering
KPC-cF: Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering
Kibeom Nam
408
0
0
29 Jun 2024
SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus
  User-Generated Content Detection and Comparison
SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison
Anjali Rawal
Hui Wang
Youjia Zheng
Yu-Hsuan Lin
Shanu Sushmita
DeLMO
189
0
0
28 Jun 2024
ProgressGym: Alignment with a Millennium of Moral Progress
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu
Yang Zhang
Xuchuan Huang
Jasmine Xinze Li
Yalan Qin
Yaodong Yang
AI4TS
278
9
0
28 Jun 2024
CHEW: A Dataset of CHanging Events in Wikipedia
CHEW: A Dataset of CHanging Events in Wikipedia
Hsuvas Borkakoty
Luis Espinosa-Anke
234
2
0
27 Jun 2024
MPCODER: Multi-user Personalized Code Generator with Explicit and
  Implicit Style Representation Learning
MPCODER: Multi-user Personalized Code Generator with Explicit and Implicit Style Representation Learning
Zhenlong Dai
Chang Yao
Wenkang Han
Ying Yuan
Zhipeng Gao
Jingyuan Chen
185
22
0
25 Jun 2024
Task Oriented In-Domain Data Augmentation
Task Oriented In-Domain Data Augmentation
Xiao Liang
Xinyu Hu
Simiao Zuo
Yeyun Gong
Qiang Lou
Yi Liu
Shao-Lun Huang
Jian Jiao
194
8
0
24 Jun 2024
Evaluating the Effectiveness of the Foundational Models for Q&A
  Classification in Mental Health care
Evaluating the Effectiveness of the Foundational Models for Q&A Classification in Mental Health care
Hassan Alhuzali
Ashwag Alasmari
AI4MH
262
4
0
23 Jun 2024
Previous
123...567...262728
Next