ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.02486
  4. Cited By
LongNet: Scaling Transformers to 1,000,000,000 Tokens
v1v2 (latest)

LongNet: Scaling Transformers to 1,000,000,000 Tokens

5 July 2023
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
    CLL
ArXiv (abs)PDFHTMLHuggingFace (80 upvotes)Github (17840★)

Papers citing "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

50 / 80 papers shown
Autonomous labeling of surgical resection margins using a foundation model
Autonomous labeling of surgical resection margins using a foundation model
Xilin Yang
Musa Aydin
Yuhong Lu
Şahan Yoruç Selçuk
Bijie Bai
...
Katjana Ehrlich
Julien Bec
Laura Marcu
N. Pillar
Aydogan Ozcan
116
0
0
27 Nov 2025
KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference
KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference
H. Zhang
Chunwei Xia
Zheng Wang
SyDa
419
2
0
14 Nov 2025
How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy
How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy
Hanwen Liu
Yixuan Ma
Shi Jin
Yuguang Wang
164
0
0
08 Nov 2025
Zero-RAG: Towards Retrieval-Augmented Generation with Zero Redundant Knowledge
Zero-RAG: Towards Retrieval-Augmented Generation with Zero Redundant Knowledge
Qi Luo
X. Li
Junqi Dai
Shuang Cheng
Xipeng Qiu
RALM
400
1
0
01 Nov 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Linear: An Expressive, Efficient Attention Architecture
Kimi Team
Yu Zhang
Zongyu Lin
Xingcheng Yao
J. Hu
...
Guokun Lai
Yuxin Wu
Xinyu Zhou
Zhilin Yang
Yulun Du
180
41
0
30 Oct 2025
From Masks to Worlds: A Hitchhiker's Guide to World Models
From Masks to Worlds: A Hitchhiker's Guide to World Models
Jinbin Bai
Yu Lei
H. Wu
Yuchen Zhu
Shufan Li
Yi Xin
Xiangtai Li
Molei Tao
Aditya Grover
Ming-Hsuan Yang
VGenSyDa
242
3
0
23 Oct 2025
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
Renzhao Liang
Sizhe Xu
Chenggang Xie
Jingru Chen
Feiyang Ren
Shu Yang
Takahiro Yabe
AI4TS
203
0
0
22 Oct 2025
GAS-MIL: Group-Aggregative Selection Multi-Instance Learning for Ensemble of Foundation Models in Digital Pathology Image Analysis
GAS-MIL: Group-Aggregative Selection Multi-Instance Learning for Ensemble of Foundation Models in Digital Pathology Image Analysis
Peiran Quan
Zifan Gu
Zhuo Zhao
Qin Zhou
Peifeng Ruan
Ruichen Rong
Yang Xie
Tao Wang
AI4CE
139
0
0
03 Oct 2025
Positional Encoding via Token-Aware Phase Attention
Positional Encoding via Token-Aware Phase Attention
Wang
Sheng Shen
Rémi Munos
Hongyuan Zhan
Yuandong Tian
250
1
0
16 Sep 2025
Bidirectional Sparse Attention for Faster Video Diffusion Training
Bidirectional Sparse Attention for Faster Video Diffusion Training
Chenlu Zhan
W. Li
Chuyu Shen
J. Zhang
Suhui Wu
H. Zhang
VGen
351
8
0
01 Sep 2025
From slides to AI-ready maps: Standardized multi-layer tissue maps as metadata for artificial intelligence in digital pathology
From slides to AI-ready maps: Standardized multi-layer tissue maps as metadata for artificial intelligence in digital pathology
Gernot Fiala
M. Plass
Robert Harb
P. Regitnig
Kristijan Skok
...
Roman Stoklasa
Rudolf Nenutil
N. Zerbe
Andreas Holzinger
Petr Holub
186
0
0
29 Aug 2025
Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints
Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints
Sandeep Reddy
Kabir Khan
Rohit Patil
Ananya Chakraborty
Faizan A. Khan
Swati Kulkarni
Arjun Verma
Neha Singh
236
1
0
14 Aug 2025
Benchmarking Foundation Models for Mitotic Figure Classification
Benchmarking Foundation Models for Mitotic Figure Classification
Jonas Ammeling
J. Ganz
Emely Rosbach
Ludwig Lausser
C. Bertram
Katharina Breininger
Marc Aubreville
OOD
169
1
0
06 Aug 2025
Efficient Attention Mechanisms for Large Language Models: A Survey
Efficient Attention Mechanisms for Large Language Models: A Survey
Yutao Sun
Zhenyu Li
Yike Zhang
Tengyu Pan
Bowen Dong
Yuyi Guo
Jianyong Wang
379
18
0
25 Jul 2025
Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention
Zhihao Zhan
Jianan Zhao
Zhaocheng Zhu
Jian Tang
295
3
0
01 Jul 2025
Do Multiple Instance Learning Models Transfer?
Daniel Shao
Richard J. Chen
Andrew H. Song
Joel Runevic
Ming Y. Lu
Tong Ding
Faisal Mahmood
MedIm
371
18
0
10 Jun 2025
From Raw Corpora to Domain Benchmarks: Automated Evaluation of LLM Domain Expertise
From Raw Corpora to Domain Benchmarks: Automated Evaluation of LLM Domain Expertise
Nitin Sharma
Thomas Wolfers
Çağatay Yıldız
ALM
235
0
0
09 Jun 2025
Spark Transformer: Reactivating Sparsity in FFN and Attention
Spark Transformer: Reactivating Sparsity in FFN and Attention
Chong You
Kan Wu
Zhipeng Jia
Lin Chen
Srinadh Bhojanapalli
...
Felix X. Yu
Prateek Jain
David Culler
Henry M. Levy
Sanjiv Kumar
290
4
0
07 Jun 2025
Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering
Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering
H. Chen
Yi Yang
Yinghui Li
Meishan Zhang
Xiaoshi Zhong
Min Zhang
RALM
488
2
0
26 May 2025
How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation
How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation
Xin Lu
Yanyan Zhao
Si Wei
Shijin Wang
Bing Qin
Ting Liu
263
0
0
24 May 2025
UNet with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning for Medical Image Segmentation
UNet with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning for Medical Image Segmentation
Saqib Qamar
Mohd Fazil
Parvez Ahmad
Ghulam Muhammad
Abu Taha Zamani
Mamba
403
1
0
21 May 2025
Scale-invariant Attention
Scale-invariant Attention
Ben Anson
Xi Wang
Laurence Aitchison
LRM
557
2
0
20 May 2025
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution
Xingyu Zhou
Wei Long
Jingbo Lu
Shiyin Jiang
Weiyi You
Haifeng Wu
Shuhang Gu
363
0
0
04 May 2025
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
Xu Ma
Peize Sun
Haoyu Ma
Hao Tang
Chih-Yao Ma
...
Matt Feiszli
Peizhao Zhang
Peter Vajda
Sam S. Tsai
Y. Fu
644
17
0
24 Apr 2025
Efficient Pretraining Length Scaling
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
1.2K
2
0
21 Apr 2025
A Survey of Pathology Foundation Model: Progress and Future Directions
A Survey of Pathology Foundation Model: Progress and Future DirectionsInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Conghao Xiong
Hao Chen
Joseph J. Y. Sung
LM&MAAI4CE
574
13
0
05 Apr 2025
Cognitive Memory in Large Language Models
Cognitive Memory in Large Language Models
Lianlei Shan
Shixian Luo
Zezhou Zhu
Yu Yuan
Yong Wu
LLMAGKELM
1.3K
28
0
03 Apr 2025
HOT: Hadamard-based Optimized Training
HOT: Hadamard-based Optimized TrainingComputer Vision and Pattern Recognition (CVPR), 2025
Seonggon Kim
Juncheol Shin
Seung-taek Woo
Eunhyeok Park
301
1
0
27 Mar 2025
Atlas: Multi-Scale Attention Improves Long Context Image Modeling
Atlas: Multi-Scale Attention Improves Long Context Image Modeling
Kumar Krishna Agrawal
Long Lian
Lu Liu
Natalia Harguindeguy
Boyi Li
Alexander Bick
Maggie Chung
Trevor Darrell
Adam Yala
ViT
228
2
0
16 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
497
8
0
12 Mar 2025
L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling
L2^22M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen
Oriol Mayné i Comas
Zhuotao Jin
Di Luo
Marin Soljacic
427
6
0
06 Mar 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence InferenceInternational Conference on Learning Representations (ICLR), 2025
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
358
82
0
28 Feb 2025
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale
Jiaxi Li
Xingxing Zhang
Xun Wang
Xiaolong Huang
Li Dong
Liang Wang
Si-Qing Chen
Wei Lu
Furu Wei
SyDa
1.1K
6
0
23 Feb 2025
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Cheng Luo
Zefan Cai
Hanshi Sun
Jinqi Xiao
Bo Yuan
Wen Xiao
Junjie Hu
Jiawei Zhao
Beidi Chen
Julius Berner
396
7
0
18 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
1.4K
1
0
04 Feb 2025
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques
Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing TechniquesIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2025
Nathaniel Tomczak
Sanmukh Kuppannagari
661
1
0
31 Jan 2025
Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Episodic Memories Generation and Evaluation Benchmark for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025
Alexis Huet
Zied Ben-Houidi
Dario Rossi
LLMAG
325
13
0
21 Jan 2025
Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections
Xitong Ling
Yuanyuan Lei
Jiawen Li
Junru Cheng
Wenting Huang
Tian Guan
Jian Guan
Yonghong He
212
4
0
16 Nov 2024
What is Wrong with Perplexity for Long-context Language Modeling?
What is Wrong with Perplexity for Long-context Language Modeling?International Conference on Learning Representations (ICLR), 2024
Lizhe Fang
Yifei Wang
Zhaoyang Liu
Chenheng Zhang
Stefanie Jegelka
Jinyang Gao
Bolin Ding
Yisen Wang
799
43
0
31 Oct 2024
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024
Ying Chen
Guoan Wang
Yuanfeng Ji
Yanjun Li
Jin Ye
Tianbin Li
Bin Zhang
Nana Pei
Rongshan Yu
Yu Qiao
VLMLM&MA
428
40
0
15 Oct 2024
On Efficient Variants of Segment Anything Model: A Survey
On Efficient Variants of Segment Anything Model: A SurveyInternational Journal of Computer Vision (IJCV), 2024
Xiaorui Sun
Jing Liu
Mengqi Li
Xiaofeng Zhu
Ping Hu
VLM
587
29
0
07 Oct 2024
Selective Attention Improves Transformer
Selective Attention Improves TransformerInternational Conference on Learning Representations (ICLR), 2024
Yaniv Leviathan
Matan Kalman
Yossi Matias
461
25
0
03 Oct 2024
dnaGrinder: a lightweight and high-capacity genomic foundation model
dnaGrinder: a lightweight and high-capacity genomic foundation model
Qihang Zhao
Chi Zhang
Weixiong Zhang
279
3
0
24 Sep 2024
PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference
PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference
Zeyu Zhang
Haiying Shen
VLM
412
1
0
23 Sep 2024
Towards LifeSpan Cognitive Systems
Towards LifeSpan Cognitive Systems
Yu Wang
Chi Han
Tongtong Wu
Xiaoxin He
Wangchunshu Zhou
...
Zexue He
Wei Wang
Gholamreza Haffari
Heng Ji
Julian McAuley
KELMCLL
1.1K
11
0
20 Sep 2024
A New Era in Computational Pathology: A Survey on Foundation and
  Vision-Language Models
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CEVLM
480
11
0
23 Aug 2024
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
Yingxue Xu
Yihui Wang
Fengtao Zhou
Jiabo Ma
Shu Yang
...
Anjia Han
Ronald Cheong Kin Chan
Li Liang
Xiuming Zhang
Hao Chen
537
67
0
22 Jul 2024
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong Wang
Zifeng Wang
Long Le
Huaixiu Steven Zheng
Swaroop Mishra
...
Anush Mattapalli
Ankur Taly
Jingbo Shang
Zifeng Wang
Tomas Pfister
RALM
378
89
0
11 Jul 2024
Pan-cancer Histopathology WSI Pre-training with Position-aware Masked
  Autoencoder
Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder
Kun-Hsuan Wu
Zhiguo Jiang
Kunming Tang
Jun Shi
Fengying Xie
Wei Wang
Haibo Wu
Yushan Zheng
335
4
0
10 Jul 2024
Meta Large Language Model Compiler: Foundation Models of Compiler
  Optimization
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
Chris Cummins
Volker Seeker
Dejan Grubisic
Baptiste Roziere
Jonas Gehring
Gabriel Synnaeve
Hugh Leather
286
62
0
27 Jun 2024
12
Next
Page 1 of 2