Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.02660
Cited By
How to Train Long-Context Language Models (Effectively)
3 October 2024
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to Train Long-Context Language Models (Effectively)"
27 / 27 papers shown
Title
RWKV-X: A Linear Complexity Hybrid Language Model
Haowen Hou
Zhiyi Huang
Kaifeng Tan
Rongchang Lu
Fei Richard Yu
VLM
78
0
0
30 Apr 2025
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
25
0
0
21 Apr 2025
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Adithya Pratapa
Teruko Mitamura
RALM
26
0
0
17 Apr 2025
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models
C. Xu
Wei Ping
P. Xu
Z. Liu
Boxin Wang
M. Shoeybi
Bo Li
Bryan Catanzaro
17
1
0
08 Apr 2025
Steering off Course: Reliability Challenges in Steering Language Models
Patrick Queiroz Da Silva
Hari Sethuraman
Dheeraj Rajagopal
Hannaneh Hajishirzi
Sachin Kumar
LLMSV
26
1
0
06 Apr 2025
PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference
Weisheng Jin
Maojia Song
Tej Deep Pala
Yew Ken Chia
Amir Zadeh
Chuan Li
Soujanya Poria
VLM
47
0
0
30 Mar 2025
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu
Qian Liu
Haonan Wang
Shiqi Chen
Xiangming Gu
Tianyu Pang
Min-Yen Kan
36
0
0
19 Mar 2025
Token Weighting for Long-Range Language Modeling
Falko Helm
Nico Daheim
Iryna Gurevych
52
1
0
12 Mar 2025
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
G. Wang
Shubhangi Upasani
Chen Henry Wu
Darshan Gandhi
Jonathan Li
Changran Hu
Bo Li
Urmish Thakker
67
0
0
11 Mar 2025
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
36
0
0
07 Mar 2025
Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision
Dawei Zhu
Xiyu Wei
Guangxiang Zhao
Wenhao Wu
Haosheng Zou
Junfeng Ran
Xun Wang
Lin Sun
Xiangzheng Zhang
Sujian Li
LRM
54
0
0
28 Feb 2025
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs
Hao Ge
Junda Feng
Qi Huang
Fangcheng Fu
Xiaonan Nie
Lei Zuo
Haibin Lin
Bin Cui
Xin Liu
29
2
0
28 Feb 2025
LongAttn: Selecting Long-context Training Data via Token-level Attention
Longyun Wu
Dawei Zhu
Guangxiang Zhao
Zhuocheng Yu
Junfeng Ran
Xiangyu Wong
Lin Sun
Sujian Li
31
0
0
24 Feb 2025
Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning
Wenhao Zhu
Pinzhen Chen
Hanxu Hu
Shujian Huang
Fei Yuan
Jiajun Chen
Alexandra Birch
SyDa
49
0
0
24 Feb 2025
LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification
Penghui Yang
Cunxiao Du
Fengzhuo Zhang
Haonan Wang
Tianyu Pang
Chao Du
Bo An
RALM
42
0
0
24 Feb 2025
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
Jiaxi Li
Xingxing Zhang
Xun Wang
Xiaolong Huang
Li Dong
Liang Wang
Si-Qing Chen
Wei Lu
Furu Wei
SyDa
55
0
0
23 Feb 2025
Self-Taught Agentic Long Context Understanding
Yufan Zhuang
Xiaodong Yu
Jialian Wu
X. Sun
Z. Wang
Jiang Liu
Yusheng Su
Jingbo Shang
Zicheng Liu
Emad Barsoum
LRM
31
0
0
21 Feb 2025
CLIPPER: Compression enables long-context synthetic data generation
Chau Minh Pham
Yapei Chang
Mohit Iyyer
SyDa
70
1
0
21 Feb 2025
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
Zican Dong
Junyi Li
Jinhao Jiang
Mingyu Xu
Wayne Xin Zhao
B. Wang
Weipeng Chen
VLM
87
0
0
20 Feb 2025
NExtLong: Toward Effective Long-Context Training without Long Documents
Chaochen Gao
Xing Wu
Zijia Lin
Debing Zhang
Songlin Hu
SyDa
64
1
0
22 Jan 2025
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression
Chenlong Deng
Zhisong Zhang
Kelong Mao
Shuaiyi Li
Xinting Huang
Dong Yu
Zhicheng Dou
36
1
0
23 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
77
51
0
18 Dec 2024
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Haonan Wang
Qian Liu
Chao Du
Tongyao Zhu
Cunxiao Du
Kenji Kawaguchi
Tianyu Pang
73
5
0
20 Nov 2024
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
X. Sun
Yanfeng Chen
Y. Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoE
ALM
ELM
65
24
0
04 Nov 2024
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
58
3
0
30 Oct 2024
How much do contextualized representations encode long-range context?
Simeng Sun
Cheng-Ping Hsieh
39
0
0
16 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALM
ELM
41
24
0
03 Oct 2024
1