ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.04014
  4. Cited By
Continual Pre-Training of Large Language Models: How to (re)warm your
  model?

Continual Pre-Training of Large Language Models: How to (re)warm your model?

8 August 2023
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats L. Richter
Quentin G. Anthony
Eugene Belilovsky
Irina Rish
Timothée Lesort
    KELM
ArXivPDFHTML

Papers citing "Continual Pre-Training of Large Language Models: How to (re)warm your model?"

50 / 85 papers shown
Title
Learning Dynamics in Continual Pre-Training for Large Language Models
Learning Dynamics in Continual Pre-Training for Large Language Models
Xingjin Wang
Howe Tissue
Lu Wang
Linjing Li
D. Zeng
CLL
16
0
0
12 May 2025
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Dusit Niyato
C. Hong
AI4CE
21
0
0
11 May 2025
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
Jinpeng Chen
Runmin Cong
Yuzhi Zhao
Hongzheng Yang
Guangneng Hu
H. Ip
Sam Kwong
CLL
KELM
59
0
0
05 May 2025
EnronQA: Towards Personalized RAG over Private Documents
EnronQA: Towards Personalized RAG over Private Documents
Michael J. Ryan
Danmei Xu
Chris Nivera
Daniel Campos
SILM
55
0
0
01 May 2025
WenyanGPT: A Large Language Model for Classical Chinese Tasks
WenyanGPT: A Large Language Model for Classical Chinese Tasks
Xinyu Yao
Mengdi Wang
Bo Chen
Xiaobing Zhao
67
0
0
29 Apr 2025
DIMT25@ICDAR2025: HW-TSC's End-to-End Document Image Machine Translation System Leveraging Large Vision-Language Model
DIMT25@ICDAR2025: HW-TSC's End-to-End Document Image Machine Translation System Leveraging Large Vision-Language Model
Zhanglin Wu
Tengfei Song
Ning Xie
W. Zhang
Pengfei Li
Shuang Wu
C. Li
Junhao Zhu
Hao-Yu Yang
28
0
0
24 Apr 2025
Kuwain 1.5B: An Arabic SLM via Language Injection
Kuwain 1.5B: An Arabic SLM via Language Injection
Khalil Hennara
Sara Chrouf
Mohamed Motaism Hamed
Zeina Aldallal
Omar Hadid
Safwan AlModhayan
29
1
0
21 Apr 2025
Memorization vs. Reasoning: Updating LLMs with New Knowledge
Memorization vs. Reasoning: Updating LLMs with New Knowledge
Aochong Oliver Li
Tanya Goyal
KELM
50
0
0
16 Apr 2025
Domain-Adaptive Continued Pre-Training of Small Language Models
Domain-Adaptive Continued Pre-Training of Small Language Models
Salman Faroz
CLL
30
0
0
13 Apr 2025
Playpen: An Environment for Exploring Learning Through Conversational Interaction
Playpen: An Environment for Exploring Learning Through Conversational Interaction
Nicola Horst
Davide Mazzaccara
Antonia Schmidt
Michael Sullivan
Filippo Momentè
...
Alexander Koller
Oliver Lemon
David Schlangen
Mario Giulianelli
Alessandro Suglia
OffRL
32
0
0
11 Apr 2025
Large Language Model Empowered Recommendation Meets All-domain Continual Pre-Training
Large Language Model Empowered Recommendation Meets All-domain Continual Pre-Training
Haokai Ma
Yunshan Ma
Ruobing Xie
Lei Meng
Jialie Shen
X. Sun
Zhanhui Kang
Tat-Seng Chua
CLL
LRM
32
0
0
11 Apr 2025
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization
Jing Yao
Xiaoyuan Yi
Jindong Wang
Zhicheng Dou
Xing Xie
23
0
0
09 Apr 2025
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
Jeffrey Li
Mohammadreza Armandpour
Iman Mirzadeh
Sachin Mehta
Vaishaal Shankar
...
Samy Bengio
Oncel Tuzel
Mehrdad Farajtabar
Hadi Pouransari
Fartash Faghri
CLL
KELM
59
0
0
02 Apr 2025
Using LLMs for Automated Privacy Policy Analysis: Prompt Engineering, Fine-Tuning and Explainability
Using LLMs for Automated Privacy Policy Analysis: Prompt Engineering, Fine-Tuning and Explainability
Yuxin Chen
Peng Tang
Weidong Qiu
Shujun Li
36
0
0
16 Mar 2025
Continual Pre-training of MoEs: How robust is your router?
Benjamin Thérien
Charles-Étienne Joseph
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
S.
Eugene Belilovsky
Irina Rish
MoE
66
0
0
06 Mar 2025
Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training
Paul Janson
Vaibhav Singh
Paria Mehrbod
Adam Ibrahim
Irina Rish
Eugene Belilovsky
Benjamin Thérien
CLL
73
0
0
04 Mar 2025
LoRA-Null: Low-Rank Adaptation via Null Space for Large Language Models
Pengwei Tang
Y. Liu
Dongjie Zhang
Xing Wu
Debing Zhang
57
0
0
04 Mar 2025
Same accuracy, twice as fast: continuous training surpasses retraining from scratch
Same accuracy, twice as fast: continuous training surpasses retraining from scratch
Eli Verwimp
Guy Hacohen
Tinne Tuytelaars
OnRL
39
0
0
28 Feb 2025
BayLing 2: A Multilingual Large Language Model with Efficient Language
  Alignment
BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment
Shaolei Zhang
Kehao Zhang
Qingkai Fang
Shoutao Guo
Yan Zhou
Xiaodong Liu
Yang Feng
ALM
64
0
0
25 Nov 2024
Sparse Upcycling: Inference Inefficient Finetuning
Sparse Upcycling: Inference Inefficient Finetuning
Sasha Doubov
Nikhil Sardana
Vitaliy Chiley
MoE
39
0
0
13 Nov 2024
Best Practices for Distilling Large Language Models into BERT for Web
  Search Ranking
Best Practices for Distilling Large Language Models into BERT for Web Search Ranking
Dezhi Ye
Junwei Hu
Jiabin Fan
Bowen Tian
Jie Liu
Haijin Liang
Jin Ma
36
0
0
07 Nov 2024
Exploring Forgetting in Large Language Model Pre-Training
Exploring Forgetting in Large Language Model Pre-Training
Chonghua Liao
Ruobing Xie
X. Sun
Haowen Sun
Zhanhui Kang
CLL
27
0
0
22 Oct 2024
Exploring Continual Fine-Tuning for Enhancing Language Ability in Large
  Language Model
Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model
Divyanshu Aggarwal
Sankarshan Damle
Navin Goyal
Satya Lokam
Sunayana Sitaram
CLL
18
0
0
21 Oct 2024
Scalable Data Ablation Approximations for Language Models through
  Modular Training and Merging
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
Clara Na
Ian H. Magnusson
A. Jha
Tom Sherborne
Emma Strubell
Jesse Dodge
Pradeep Dasigi
MoMe
36
4
0
21 Oct 2024
A Learning Rate Path Switching Training Paradigm for Version Updates of
  Large Language Models
A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models
Zhihao Wang
Shiyu Liu
Jianheng Huang
Zheng Wang
Yixuan Liao
Xiaoxin Chen
Junfeng Yao
Jinsong Su
16
0
0
05 Oct 2024
Using Deep Autoregressive Models as Causal Inference Engines
Using Deep Autoregressive Models as Causal Inference Engines
Daniel Jiwoong Im
Kevin Zhang
Nakul Verma
Kyunghyun Cho
CML
14
1
0
27 Sep 2024
BeanCounter: A low-toxicity, large-scale, and open dataset of
  business-oriented text
BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text
Siyan Wang
Bradford Levy
18
2
0
26 Sep 2024
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining
  for Clinical LLMs
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs
Clément Christophe
Tathagata Raha
Svetlana Maslenkova
Muhammad Umar Salman
Praveen K Kanithi
Marco AF Pimentel
Shadab Khan
LM&MA
30
1
0
23 Sep 2024
Synthetic continued pretraining
Synthetic continued pretraining
Zitong Yang
Neil Band
Shuangping Li
Emmanuel Candès
Tatsunori Hashimoto
CLL
SyDa
30
4
0
11 Sep 2024
An Investigation of Warning Erroneous Chat Translations in Cross-lingual
  Communication
An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication
Yunmeng Li
Jun Suzuki
Makoto Morishita
Kaori Abe
Kentaro Inui
46
1
0
28 Aug 2024
Scaling Law with Learning Rate Annealing
Scaling Law with Learning Rate Annealing
Howe Tissue
Venus Wang
Lu Wang
21
4
0
20 Aug 2024
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Qianqian Xie
Dong Li
Mengxi Xiao
Zihao Jiang
Ruoyu Xiang
...
Benyou Wang
Alejandro Lopez-Lira
Qianqian Xie
Sophia Ananiadou
Junichi Tsujii
AIFin
AI4TS
28
13
0
20 Aug 2024
SLCA++: Unleash the Power of Sequential Fine-tuning for Continual
  Learning with Pre-training
SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training
Gengwei Zhang
Liyuan Wang
Guoliang Kang
Ling Chen
Yunchao Wei
VLM
CLL
29
2
0
15 Aug 2024
Building Decision Making Models Through Language Model Regime
Building Decision Making Models Through Language Model Regime
Yu Zhang
Haoxiang Liu
Feijun Jiang
Weihua Luo
Kaifu Zhang
28
0
0
12 Aug 2024
Towards Effective and Efficient Continual Pre-training of Large Language
  Models
Towards Effective and Efficient Continual Pre-training of Large Language Models
Jie Chen
Zhipeng Chen
Jiapeng Wang
Kun Zhou
Yutao Zhu
...
Rui Yan
Zhewei Wei
Di Hu
Wenbing Huang
Ji-Rong Wen
KELM
ALM
CLL
ELM
LRM
32
4
0
26 Jul 2024
Bilingual Adaptation of Monolingual Foundation Models
Bilingual Adaptation of Monolingual Foundation Models
Gurpreet Gosal
Yishi Xu
Gokul Ramakrishnan
Rituraj Joshi
Avraham Sheinin
...
Rahul Pal
Parvez Mullah
Soundar Doraiswamy
Mohamed El Karim Chami
Preslav Nakov
CLL
21
2
0
13 Jul 2024
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language
  Models
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models
Jupinder Parmar
Sanjev Satheesh
M. Patwary
M. Shoeybi
Bryan Catanzaro
37
11
0
09 Jul 2024
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation
  Capabilities Beyond 100 Languages
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
Yinquan Lu
Wenhao Zhu
Lei Li
Yu Qiao
Fei Yuan
42
24
0
08 Jul 2024
Breaking Language Barriers: Cross-Lingual Continual Pre-Training at
  Scale
Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale
Wenzhen Zheng
Wenbo Pan
Xu Xu
Libo Qin
Li Yue
Ming Zhou
CLL
24
6
0
02 Jul 2024
Banishing LLM Hallucinations Requires Rethinking Generalization
Banishing LLM Hallucinations Requires Rethinking Generalization
Johnny Li
Saksham Consul
Eda Zhou
James Wong
Naila Farooqui
...
Zhuxiaona Wei
Tian Wu
Ben Echols
Sharon Zhou
Gregory Diamos
LRM
20
10
0
25 Jun 2024
A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual
  LLMs
A Three-Pronged Approach to Cross-Lingual Adaptation with Multilingual LLMs
Vaibhav Singh
Amrith Krishna
Karthika NJ
Ganesh Ramakrishnan
24
4
0
25 Jun 2024
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training
  and Model Merging: A Comprehensive Evaluation
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation
Shamane Siriwardhana
Mark McQuade
Thomas Gauthier
Lucas Atkins
Fernando Fernandes Neto
...
Anneketh Vij
Tyler Odenthal
Charles Goddard
Mary MacCarthy
Jacob Solawetz
CLL
MoMe
ALM
25
8
0
21 Jun 2024
Efficient Continual Pre-training by Mitigating the Stability Gap
Efficient Continual Pre-training by Mitigating the Stability Gap
Yiduo Guo
Jie Fu
Huishuai Zhang
Dongyan Zhao
Yikang Shen
30
12
0
21 Jun 2024
Word Matters: What Influences Domain Adaptation in Summarization?
Word Matters: What Influences Domain Adaptation in Summarization?
Yinghao Li
Siyu Miao
Heyan Huang
Yang Gao
32
3
0
21 Jun 2024
Open Generative Large Language Models for Galician
Open Generative Large Language Models for Galician
Pablo Gamallo
Pablo Rodríguez
Iria de-Dios-Flores
Susana Sotelo
Silvia Paniagua
Daniel Bardanca
José Ramom Pichel
Marcos Garcia
29
3
0
19 Jun 2024
Towards Lifelong Learning of Large Language Models: A Survey
Towards Lifelong Learning of Large Language Models: A Survey
Junhao Zheng
Shengjie Qiu
Chengming Shi
Qianli Ma
KELM
CLL
28
14
0
10 Jun 2024
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
Yibo Yang
Xiaojie Li
Zhongzhu Zhou
S. Song
Jianlong Wu
Liqiang Nie
Bernard Ghanem
43
6
0
07 Jun 2024
Conditional Language Learning with Context
Conditional Language Learning with Context
X. Zhang
Miao Li
Ji Wu
36
1
0
04 Jun 2024
Sparsity-Accelerated Training for Large Language Models
Sparsity-Accelerated Training for Large Language Models
Da Ma
Lu Chen
Pengyu Wang
Hongshen Xu
Hanqi Li
Liangtai Sun
Su Zhu
Shuai Fan
Kai Yu
LRM
23
0
0
03 Jun 2024
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large
  Language Models
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Haoran Que
Jiaheng Liu
Ge Zhang
Chenchen Zhang
Xingwei Qu
...
Jie Fu
Wenbo Su
Jiamang Wang
Lin Qu
Bo Zheng
CLL
36
11
0
03 Jun 2024
12
Next