v1v2 (latest)

LongNet: Scaling Transformers to 1,000,000,000 Tokens

5 July 2023

ArXiv (abs)PDF HTML HuggingFace (80 upvotes)Github (17840★)

Papers citing "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

50 / 80 papers shown

Autonomous labeling of surgical resection margins using a foundation model

...

116

27 Nov 2025

KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference

419

14 Nov 2025

How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy

164

08 Nov 2025

Zero-RAG: Towards Retrieval-Augmented Generation with Zero Redundant Knowledge

400

01 Nov 2025

Kimi Linear: An Expressive, Efficient Attention Architecture

...

180

30 Oct 2025

From Masks to Worlds: A Hitchhiker's Guide to World Models

242

23 Oct 2025

Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency

203

22 Oct 2025

GAS-MIL: Group-Aggregative Selection Multi-Instance Learning for Ensemble of Foundation Models in Digital Pathology Image Analysis

139

03 Oct 2025

Positional Encoding via Token-Aware Phase Attention

250

16 Sep 2025

Bidirectional Sparse Attention for Faster Video Diffusion Training

351

01 Sep 2025

From slides to AI-ready maps: Standardized multi-layer tissue maps as metadata for artificial intelligence in digital pathology

...

186

29 Aug 2025

Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints

236

14 Aug 2025

Benchmarking Foundation Models for Mitotic Figure Classification

169

06 Aug 2025

Efficient Attention Mechanisms for Large Language Models: A Survey

379

25 Jul 2025

Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention

295

01 Jul 2025

Do Multiple Instance Learning Models Transfer?

371

10 Jun 2025

From Raw Corpora to Domain Benchmarks: Automated Evaluation of LLM Domain Expertise

235

09 Jun 2025

Spark Transformer: Reactivating Sparsity in FFN and Attention

...

290

07 Jun 2025

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

Min Zhang

488

26 May 2025

How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation

263

24 May 2025

UNet with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning for Medical Image Segmentation

403

21 May 2025

Scale-invariant Attention

557

20 May 2025

Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution

363

04 May 2025

Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

...

644

24 Apr 2025

Efficient Pretraining Length Scaling

1.2K

21 Apr 2025

A Survey of Pathology Foundation Model: Progress and Future DirectionsInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

574

05 Apr 2025

Cognitive Memory in Large Language Models

1.3K

03 Apr 2025

HOT: Hadamard-based Optimized TrainingComputer Vision and Pattern Recognition (CVPR), 2025

301

27 Mar 2025

Atlas: Multi-Scale Attention Improves Long Context Image Modeling

Kumar Krishna Agrawal

228

16 Mar 2025

Multi-Modal Foundation Models for Computational Pathology: A Survey

497

12 Mar 2025

^2

M: Mutual Information Scaling Law for Long-Context Language Modeling

427

06 Mar 2025

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence InferenceInternational Conference on Learning Representations (ICLR), 2025

358

28 Feb 2025

WildLong: Synthesizing Realistic Long-Context Instruction Data at Scale

1.1K

23 Feb 2025

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

396

18 Feb 2025

LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models

Tzu-Tao Chang

Shivaram Venkataraman

VLM

1.4K

04 Feb 2025

Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing TechniquesIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2025

Nathaniel Tomczak

Sanmukh Kuppannagari

661

31 Jan 2025

Episodic Memories Generation and Evaluation Benchmark for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2025

325

21 Jan 2025

Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections

212

16 Nov 2024

What is Wrong with Perplexity for Long-context Language Modeling?International Conference on Learning Representations (ICLR), 2024

799

31 Oct 2024

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024

428

15 Oct 2024

On Efficient Variants of Segment Anything Model: A SurveyInternational Journal of Computer Vision (IJCV), 2024

587

07 Oct 2024

Selective Attention Improves TransformerInternational Conference on Learning Representations (ICLR), 2024

Yaniv Leviathan

Matan Kalman

Yossi Matias

461

03 Oct 2024

dnaGrinder: a lightweight and high-capacity genomic foundation model

Qihang Zhao

Chi Zhang

Weixiong Zhang

279

24 Sep 2024

PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference

Zeyu Zhang

Haiying Shen

VLM

412

23 Sep 2024

Towards LifeSpan Cognitive Systems

Yu Wang

...

Wei Wang

Heng Ji

Julian McAuley

KELM CLL

1.1K

20 Sep 2024

A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models

Nasim Yahya Soltani

480

23 Aug 2024

A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

...

Anjia Han

Ronald Cheong Kin Chan

Li Liang

Xiuming Zhang

Hao Chen

537

22 Jul 2024

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Zilong Wang

Zifeng Wang

Long Le

Huaixiu Steven Zheng

...

378

11 Jul 2024

Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

Fengying Xie

335

10 Jul 2024

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization

Gabriel Synnaeve

Hugh Leather

286

27 Jun 2024