Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2004.10964
Cited By

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

v1v2v3 (latest)

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

23 April 2020

Suchin Gururangan

Swabha Swayamdipta

Kyle Lo

ArXiv (abs)PDF HTML

Papers citing "Don't Stop Pretraining: Adapt Language Models to Domains and Tasks"

50 / 1,369 papers shown

Gradient Descent with Provably Tuned Learning-rate Schedules

Gradient Descent with Provably Tuned Learning-rate Schedules

Dravyansh Sharma

157

0

0

04 Dec 2025

Adapting Large Language Models to Low-Resource Tibetan: A Two-Stage Continual and Supervised Fine-Tuning Study

Adapting Large Language Models to Low-Resource Tibetan: A Two-Stage Continual and Supervised Fine-Tuning Study

143

0

0

03 Dec 2025

Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets

Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets

Muhammad Muneeb

David B. Ascher

Ahsan Baidar Bakht

86

0

0

29 Nov 2025

Mortgage Language Model: Domain-Adaptive Pretraining with Residual Instruction, Alignment Tuning, and Task-Specific Routing

Mortgage Language Model: Domain-Adaptive Pretraining with Residual Instruction, Alignment Tuning, and Task-Specific Routing

Satheesh Kumar Ponnambalam

Chandrakanth Lns

647

0

0

26 Nov 2025

Building Domain-Specific Small Language Models via Guided Data Generation

Building Domain-Specific Small Language Models via Guided Data Generation

Ekant Muljibhai Amin

Lasitha Vidyaratne

Ahmed K. Farahat

179

0

0

23 Nov 2025

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

...

160

0

0

20 Nov 2025

Classification of Hope in Textual Data using Transformer-Based Models

Classification of Hope in Textual Data using Transformer-Based Models

Chukwuebuka Ijezue

Tania-Amanda Nkoyo Fredrick Eneye

167

0

0

17 Nov 2025

Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation

Tokenize Once, Recommend Anywhere: Unified Item Tokenization for Multi-domain LLM-based Recommendation

74

0

0

17 Nov 2025

NeuroLex: A Lightweight Domain Language Model for EEG Report Understanding and Generation

NeuroLex: A Lightweight Domain Language Model for EEG Report Understanding and Generation

141

0

0

17 Nov 2025

Evaluating the Ability of Large Language Models to Identify Adherence to CONSORT Reporting Guidelines in Randomized Controlled Trials: A Methodological Evaluation Study

Evaluating the Ability of Large Language Models to Identify Adherence to CONSORT Reporting Guidelines in Randomized Controlled Trials: A Methodological Evaluation Study

75

0

0

17 Nov 2025

Concept-Based Interpretability for Toxicity Detection

Concept-Based Interpretability for Toxicity Detection

Deeksha Varshney

Mamta

105

0

0

15 Nov 2025

Kunlun Anomaly Troubleshooter: Enabling Kernel-Level Anomaly Detection and Causal Reasoning for Large Model Distributed Inference

Kunlun Anomaly Troubleshooter: Enabling Kernel-Level Anomaly Detection and Causal Reasoning for Large Model Distributed Inference

121

0

0

08 Nov 2025

ManufactuBERT: Efficient Continual Pretraining for Manufacturing

ManufactuBERT: Efficient Continual Pretraining for Manufacturing

Robin Armingaud

Romaric Besançon

81

0

0

07 Nov 2025

MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation

MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation

Cheng-Zhi Anna Huang

369

0

0

06 Nov 2025

BIRD: Bronze Inscription Restoration and Dating

BIRD: Bronze Inscription Restoration and Dating

Hoang H. Nguyen

184

0

0

03 Nov 2025

Exploring and Mitigating Gender Bias in Encoder-Based Transformer Models

Exploring and Mitigating Gender Bias in Encoder-Based Transformer Models

Khondokar Mohammad Ahanaf Hannan

Nowreen Tarannum Rafa

Humayra Musarrat

Shoaib Ahmed Dipu

Farig Yousuf Sadeque

127

0

0

01 Nov 2025

Multilingual BERT language model for medical tasks: Evaluation on domain-specific adaptation and cross-linguality

Multilingual BERT language model for medical tasks: Evaluation on domain-specific adaptation and cross-linguality

Amrish Jhingoer

Klaske Vliegenthart--Jongbloed

Carlijn Jordans

199

0

0

31 Oct 2025

From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning

From Amateur to Master: Infusing Knowledge into LLMs via Automated Curriculum Learning

Srinjoy Mukherjee

Gokul Ramakrishnan

Ganesh Venkatesh

263

0

0

30 Oct 2025

Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning

Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning

85

0

0

29 Oct 2025

A Survey on LLM Mid-Training

A Survey on LLM Mid-Training

241

2

0

27 Oct 2025

Generating Auxiliary Tasks with Reinforcement Learning

Generating Auxiliary Tasks with Reinforcement Learning

Judah Goldfeder

242

0

0

27 Oct 2025

Network Intrusion Detection: Evolution from Conventional Approaches to LLM Collaboration and Emerging Risks

Network Intrusion Detection: Evolution from Conventional Approaches to LLM Collaboration and Emerging Risks

Kouichi Sakurai

217

1

0

27 Oct 2025

Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining

Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining

387

0

0

27 Oct 2025

Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study

Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study

Owen Van Esbroeck

141

0

0

26 Oct 2025

From Slides to Chatbots: Enhancing Large Language Models with University Course Materials

From Slides to Chatbots: Enhancing Large Language Models with University Course Materials

Philipp Nicolas Schumacher

81

0

0

25 Oct 2025

PatenTEB: A Comprehensive Benchmark and Model Family for Patent Text Embedding

PatenTEB: A Comprehensive Benchmark and Model Family for Patent Text Embedding

Denis Cavallucci

102

0

0

25 Oct 2025

VESSA: Video-based objEct-centric Self-Supervised Adaptation for Visual Foundation Models

VESSA: Video-based objEct-centric Self-Supervised Adaptation for Visual Foundation Models

Jesimon Barreto

William Robson Schwartz

132

0

0

23 Oct 2025

IKnow: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation

IKnow: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation

130

0

0

23 Oct 2025

Adapting Multilingual Models to Code-Mixed Tasks via Model Merging

Adapting Multilingual Models to Code-Mixed Tasks via Model Merging

Prashant Kodali

Vaishnavi Shivkumar

Monojit Choudhary

Ponnurangam Kumaraguru

Manish Shrivastava

356

1

0

22 Oct 2025

AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

Hong Ting Tsang

177

0

0

20 Oct 2025

Qomhra: A Bilingual Irish and English Large Language Model

Qomhra: A Bilingual Irish and English Large Language Model

Joseph McInerney

Khanh-Tung Tran

Liam Lonergan

Ailbhe Ní Chasaide

Neasa Ní Chiaráin

Barry Devereux

174

0

0

20 Oct 2025

Midtraining Bridges Pretraining and Posttraining Distributions

Midtraining Bridges Pretraining and Posttraining Distributions

209

2

0

16 Oct 2025

Cognitive-Aligned Spatio-Temporal Large Language Models For Next Point-of-Interest Prediction

Cognitive-Aligned Spatio-Temporal Large Language Models For Next Point-of-Interest Prediction

...

151

0

0

16 Oct 2025

First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training

First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training

112

0

0

16 Oct 2025

Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models

Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models

Daniil Gurgurov

Josef van Genabith

Simon Ostermann

201

0

0

15 Oct 2025

A-IPO: Adaptive Intent-driven Preference Optimization

A-IPO: Adaptive Intent-driven Preference Optimization

Muhammad Asif Ali

95

0

0

11 Oct 2025

Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics

Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics

...

LLMAG LM&Ro AI4CE

188

3

0

10 Oct 2025

Understanding the Effects of Domain Finetuning on LLMs

Understanding the Effects of Domain Finetuning on LLMs

William Yang Wang

Tanmoy Chakraborty

133

0

0

10 Oct 2025

SkipSR: Faster Super Resolution with Token Skipping

SkipSR: Faster Super Resolution with Token Skipping

Rohan Choudhury

László A. Jeni

223

0

0

09 Oct 2025

DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations

DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations

Elena Khasanova

Md Tahmid Rahman Laskar

Shashi Bhushan TN

116

0

0

09 Oct 2025

SliceFine: The Universal Winning-Slice Hypothesis for Pretrained Networks

SliceFine: The Universal Winning-Slice Hypothesis for Pretrained Networks

Ehsan Mohammady Ardehaly

Prasanth Murali

189

2

0

09 Oct 2025

Beyond Monolingual Assumptions: A Survey of Code-Switched NLP in the Era of Large Language Models across Modalities

Beyond Monolingual Assumptions: A Survey of Code-Switched NLP in the Era of Large Language Models across Modalities

Samridhi Raj Sinha

Himanshu Beniwal

313

0

0

08 Oct 2025

Reward Model Perspectives: Whose Opinions Do Reward Models Reward?

Reward Model Perspectives: Whose Opinions Do Reward Models Reward?

149

1

0

07 Oct 2025

Contrastive Learning Using Graph Embeddings for Domain Adaptation of Language Models in the Process Industry

Contrastive Learning Using Graph Embeddings for Domain Adaptation of Language Models in the Process Industry

Anastasia Zhukova

Christian E. Lobmüller

179

0

0

06 Oct 2025

AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives

AWARE, Beyond Sentence Boundaries: A Contextual Transformer Framework for Identifying Cultural Capital in STEM Narratives

Khalid Mehtab Khan

Anagha Kulkarni

89

0

0

06 Oct 2025

Train on Validation (ToV): Fast data selection with applications to fine-tuning

Train on Validation (ToV): Fast data selection with applications to fine-tuning

Andrea Montanari

184

1

0

01 Oct 2025

CustomIR: Unsupervised Fine-Tuning of Dense Embeddings for Known Document Corpora

CustomIR: Unsupervised Fine-Tuning of Dense Embeddings for Known Document Corpora

99

0

0

30 Sep 2025

Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning

Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning

Jeannette Littlemore

189

0

0

29 Sep 2025

Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs

Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs

166

0

0

29 Sep 2025

WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning

WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning

72

0

0

27 Sep 2025

1 2 3 4...26 27 28