ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.02486
  4. Cited By
LongNet: Scaling Transformers to 1,000,000,000 Tokens

LongNet: Scaling Transformers to 1,000,000,000 Tokens

5 July 2023
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
    CLL
ArXivPDFHTML

Papers citing "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

50 / 114 papers shown
Title
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Xingyu Zhou
Wei Long
Jingbo Lu
Shiyin Jiang
Weiyi You
Haifeng Wu
Shuhang Gu
31
0
0
04 May 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
68
1
0
24 Apr 2025
Efficient Pretraining Length Scaling
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
49
0
0
21 Apr 2025
Reasoning on Multiple Needles In A Haystack
Reasoning on Multiple Needles In A Haystack
Yidong Wang
LRM
31
0
0
05 Apr 2025
A Survey of Pathology Foundation Model: Progress and Future Directions
A Survey of Pathology Foundation Model: Progress and Future Directions
Conghao Xiong
Hao Chen
Joseph J. Y. Sung
LM&MA
AI4CE
51
0
0
05 Apr 2025
Cognitive Memory in Large Language Models
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAG
KELM
69
1
0
03 Apr 2025
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
Haolong Yan
Kaijun Tan
Yeqing Shen
Xin Huang
Zheng Ge
Xiangyu Zhang
Si Li
Daxin Jiang
VLM
35
0
0
27 Mar 2025
HOT: Hadamard-based Optimized Training
HOT: Hadamard-based Optimized Training
Seonggon Kim
Juncheol Shin
Seung-taek Woo
Eunhyeok Park
43
0
0
27 Mar 2025
Atlas: Multi-Scale Attention Improves Long Context Image Modeling
Atlas: Multi-Scale Attention Improves Long Context Image Modeling
Kumar Krishna Agrawal
Long Lian
L. Liu
Natalia Harguindeguy
Boyi Li
Alexander Bick
Maggie Chung
Trevor Darrell
Adam Yala
ViT
50
0
0
16 Mar 2025
L2^22M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen
Oriol Mayné i Comas
Zhuotao Jin
Di Luo
Marin Soljacic
62
0
0
06 Mar 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
63
5
0
28 Feb 2025
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
Jiaxi Li
Xingxing Zhang
Xun Wang
Xiaolong Huang
Li Dong
Liang Wang
Si-Qing Chen
Wei Lu
Furu Wei
SyDa
73
0
0
23 Feb 2025
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Cheng Luo
Zefan Cai
Hanshi Sun
Jinqi Xiao
Bo Yuan
Wen Xiao
Junjie Hu
Jiawei Zhao
Beidi Chen
Anima Anandkumar
59
1
0
18 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
87
0
0
04 Feb 2025
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
Nathaniel Tomczak
Sanmukh Kuppannagari
89
0
0
31 Jan 2025
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Alexis Huet
Zied Ben-Houidi
Dario Rossi
LLMAG
52
0
0
21 Jan 2025
Membership Inference Attack against Long-Context Large Language Models
Zixiong Wang
Gaoyang Liu
Yang Yang
Chen Wang
76
1
0
18 Nov 2024
Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections
Xitong Ling
Yuanyuan Lei
Jiawen Li
Junru Cheng
Wenting Huang
Tian Guan
Jian Guan
Yonghong He
20
4
0
16 Nov 2024
Efficient Adaptive Optimization via Subset-Norm and Subspace-Momentum:
  Fast, Memory-Reduced Training with Convergence Guarantees
Efficient Adaptive Optimization via Subset-Norm and Subspace-Momentum: Fast, Memory-Reduced Training with Convergence Guarantees
T. Nguyen
Huy Le Nguyen
ODL
28
0
0
11 Nov 2024
What is Wrong with Perplexity for Long-context Language Modeling?
What is Wrong with Perplexity for Long-context Language Modeling?
Lizhe Fang
Yifei Wang
Zhaoyang Liu
Chenheng Zhang
Stefanie Jegelka
Jinyang Gao
Bolin Ding
Yisen Wang
58
4
0
31 Oct 2024
Two are better than one: Context window extension with multi-grained
  self-injection
Two are better than one: Context window extension with multi-grained self-injection
Wei Han
Pan Zhou
Soujanya Poria
Shuicheng Yan
24
0
0
25 Oct 2024
Taipan: Efficient and Expressive State Space Language Models with
  Selective Attention
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen
Huy Huu Nguyen
Thang M. Pham
Ruiyi Zhang
Hanieh Deilamsalehy
...
Ryan A. Rossi
Trung Bui
Viet Dac Lai
Franck Dernoncourt
Thien Huu Nguyen
Mamba
RALM
29
1
0
24 Oct 2024
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
Ying Chen
Guoan Wang
Yuanfeng Ji
Yanjun Li
Jin Ye
Tianbin Li
Bin Zhang
Nana Pei
Rongshan Yu
Yu Qiao
VLM
LM&MA
51
2
0
15 Oct 2024
On Efficient Variants of Segment Anything Model: A Survey
On Efficient Variants of Segment Anything Model: A Survey
Xiaorui Sun
J. Liu
H. Shen
Xiaofeng Zhu
Ping Hu
VLM
43
4
0
07 Oct 2024
Selective Attention Improves Transformer
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
46
8
0
03 Oct 2024
Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios
Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios
Jiwei Tang
Jin Xu
Tingwei Lu
Hai Lin
Yiming Zhao
Lin Hai
Hai-Tao Zheng
VLM
52
0
0
28 Sep 2024
dnaGrinder: a lightweight and high-capacity genomic foundation model
dnaGrinder: a lightweight and high-capacity genomic foundation model
Qihang Zhao
Chi Zhang
Weixiong Zhang
16
0
0
24 Sep 2024
CSPS: A Communication-Efficient Sequence-Parallelism based Serving
  System for Transformer based Models with Long Prompts
CSPS: A Communication-Efficient Sequence-Parallelism based Serving System for Transformer based Models with Long Prompts
Zeyu Zhang
Haiying Shen
VLM
19
0
0
23 Sep 2024
More Effective LLM Compressed Tokens with Uniformly Spread Position
  Identifiers and Compression Loss
More Effective LLM Compressed Tokens with Uniformly Spread Position Identifiers and Compression Loss
Runsong Zhao
Pengcheng Huang
Xinyu Liu
Chunyang Xiao
Tong Xiao
Jingbo Zhu
16
0
0
22 Sep 2024
Towards LifeSpan Cognitive Systems
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELM
CLL
83
1
0
20 Sep 2024
Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in
  Selective State Space Models
Bio-Inspired Mamba: Temporal Locality and Bioplausible Learning in Selective State Space Models
Jiahao Qin
Mamba
AI4CE
11
1
0
17 Sep 2024
Mamba-ST: State Space Model for Efficient Style Transfer
Mamba-ST: State Space Model for Efficient Style Transfer
Filippo Botti
Alex Ergasti
Leonardo Rossi
Tomaso Fontanini
Claudio Ferrari
Massimo Bertozzi
Andrea Prati
Mamba
35
2
0
16 Sep 2024
A New Era in Computational Pathology: A Survey on Foundation and
  Vision-Language Models
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CE
VLM
34
7
0
23 Aug 2024
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large
  Language Models
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Zhongyu Zhao
Menghang Dong
Rongyu Zhang
Wenzhao Zheng
Yunpeng Zhang
Huanrui Yang
Dalong Du
Kurt Keutzer
Shanghang Zhang
46
0
0
15 Aug 2024
Instruct Large Language Models to Generate Scientific Literature Survey
  Step by Step
Instruct Large Language Models to Generate Scientific Literature Survey Step by Step
Yuxuan Lai
Yupeng Wu
Yidan Wang
Wenpeng Hu
Chen Zheng
44
3
0
15 Aug 2024
Post-Training Sparse Attention with Double Sparsity
Post-Training Sparse Attention with Double Sparsity
Shuo Yang
Ying Sheng
Joseph E. Gonzalez
Ion Stoica
Lianmin Zheng
23
7
0
11 Aug 2024
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
Yingxue Xu
Yihui Wang
Fengtao Zhou
Jiabo Ma
Shu Yang
...
Anjia Han
Ronald Cheong Kin Chan
Li Liang
Xiuming Zhang
Hao Chen
29
13
0
22 Jul 2024
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong Wang
Zifeng Wang
Long Le
Huaixiu Steven Zheng
Swaroop Mishra
...
Anush Mattapalli
Ankur Taly
Jingbo Shang
Chen-Yu Lee
Tomas Pfister
RALM
70
30
0
11 Jul 2024
Pan-cancer Histopathology WSI Pre-training with Position-aware Masked
  Autoencoder
Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder
Kun-Hsuan Wu
Zhiguo Jiang
Kunming Tang
Jun Shi
Fengying Xie
Wei Wang
Haibo Wu
Yushan Zheng
21
1
0
10 Jul 2024
A Clinical Benchmark of Public Self-Supervised Pathology Foundation
  Models
A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models
Gabriele Campanella
Shengjia Chen
Ruchika Verma
Jennifer Zeng
A. Stock
...
Kuan-lin Huang
Ricky Kwan
Jane Houldsworth
Adam J. Schoenfeld
Chad M. Vanderbilt
AI4MH
OOD
LM&MA
27
16
0
09 Jul 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
67
1
0
02 Jul 2024
Meta Large Language Model Compiler: Foundation Models of Compiler
  Optimization
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
Chris Cummins
Volker Seeker
Dejan Grubisic
Baptiste Roziere
Jonas Gehring
Gabriel Synnaeve
Hugh Leather
29
15
0
27 Jun 2024
Fibottention: Inceptive Visual Representation Learning with Diverse
  Attention Across Heads
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads
Ali Khaleghi Rahimian
Manish Kumar Govind
Subhajit Maity
Dominick Reilly
Christian Kummerle
Srijan Das
A. Dutta
31
1
0
27 Jun 2024
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context
  Parallelism
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism
Diandian Gu
Peng Sun
Qinghao Hu
Ting Huang
Xun Chen
...
Jiarui Fang
Yonggang Wen
Tianwei Zhang
Xin Jin
Xuanzhe Liu
LRM
28
7
0
26 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
26
18
0
24 Jun 2024
SampleAttention: Near-Lossless Acceleration of Long Context LLM
  Inference with Adaptive Structured Sparse Attention
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu
Jiangfei Duan
Chang Chen
Siran Liu
Xiuhong Li
...
Huanqi Cao
Xiao Chuanfu
Xingcheng Zhang
Dahua Lin
Chao Yang
25
15
0
17 Jun 2024
What Kinds of Tokens Benefit from Distant Text? An Analysis on Long
  Context Language Modeling
What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling
Yutong Hu
Quzhe Huang
Kangcheng Luo
Yansong Feng
43
1
0
17 Jun 2024
BABILong: Testing the Limits of LLMs with Long Context
  Reasoning-in-a-Haystack
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
ALM
LRM
ReLM
ELM
42
57
0
14 Jun 2024
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large
  Language Model Training
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
Ao Sun
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
24
7
0
05 Jun 2024
Story Generation from Visual Inputs: Techniques, Related Tasks, and
  Challenges
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Daniel A. P. Oliveira
Eugénio Ribeiro
David Martins de Matos
VGen
15
2
0
04 Jun 2024
123
Next