Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.13991
Cited By
Analysing The Impact of Sequence Composition on Language Model Pre-Training
21 February 2024
Yu Zhao
Yuanbin Qu
Konrad Staniszewski
Szymon Tworkowski
Wei Liu
Piotr Milo's
Yuxiang Wu
Pasquale Minervini
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Analysing The Impact of Sequence Composition on Language Model Pre-Training"
14 / 14 papers shown
Title
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
36
0
0
21 Apr 2025
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu
Qian Liu
Haonan Wang
Shiqi Chen
Xiangming Gu
Tianyu Pang
Min-Yen Kan
36
0
0
19 Mar 2025
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
Longxu Dou
Qian Liu
Fan Zhou
Changyu Chen
Zili Wang
...
Tianyu Pang
Chao Du
Xinyi Wan
Wei Lu
Min Lin
84
1
0
18 Feb 2025
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Sohee Yang
Nora Kassner
E. Gribovskaya
Sebastian Riedel
Mor Geva
KELM
LRM
ReLM
78
4
0
25 Nov 2024
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Haonan Wang
Qian Liu
Chao Du
Tongyao Zhu
Cunxiao Du
Kenji Kawaguchi
Tianyu Pang
88
5
0
20 Nov 2024
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
Georgios Pantazopoulos
Malvina Nikandrou
Alessandro Suglia
Oliver Lemon
Arash Eshghi
Mamba
33
1
0
09 Sep 2024
Refining Packing and Shuffling Strategies for Enhanced Performance in Generative Language Models
Yanbing Chen
Ruilin Wang
Zihao Yang
L. Jiang
E. Oermann
KELM
20
0
0
19 Aug 2024
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs
Jiancheng Dong
Lei Jiang
Wei Jin
Lu Cheng
34
1
0
18 Aug 2024
Enhancing Training Efficiency Using Packing with Flash Attention
Achintya Kundu
Rhui Dih Lee
L. Wynter
R. Ganti
Mayank Mishra
CVBM
16
4
0
12 Jul 2024
Structured Packing in LLM Training Improves Long Context Utilization
Konrad Staniszewski
Szymon Tworkowski
Sebastian Jaszczur
Yu Zhao
Henryk Michalewski
Lukasz Kuciñski
Piotr Milo's
28
13
0
28 Dec 2023
Pre-Training to Learn in Context
Yuxian Gu
Li Dong
Furu Wei
Minlie Huang
CLIP
LRM
ReLM
106
37
0
16 May 2023
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
237
588
0
14 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1