ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.12355
50
0

Atlas: Multi-Scale Attention Improves Long Context Image Modeling

16 March 2025
Kumar Krishna Agrawal
Long Lian
L. Liu
Natalia Harguindeguy
Boyi Li
Alexander Bick
Maggie Chung
Trevor Darrell
Adam Yala
    ViT
ArXivPDFHTML
Abstract

Efficiently modeling massive images is a long-standing challenge in machine learning. To this end, we introduce Multi-Scale Attention (MSA). MSA relies on two key ideas, (i) multi-scale representations (ii) bi-directional cross-scale communication. MSA creates O(log N) scales to represent the image across progressively coarser features and leverages cross-attention to propagate information across scales. We then introduce Atlas, a novel neural network architecture based on MSA. We demonstrate that Atlas significantly improves the compute-performance tradeoff of long-context image modeling in a high-resolution variant of ImageNet 100. At 1024px resolution, Atlas-B achieves 91.04% accuracy, comparable to ConvNext-B (91.92%) while being 4.3x faster. Atlas is 2.95x faster and 7.38% better than FasterViT, 2.25x faster and 4.96% better than LongViT. In comparisons against MambaVision-S, we find Atlas-S achieves 5%, 16% and 32% higher accuracy at 1024px, 2048px and 4096px respectively, while obtaining similar runtimes. Code for reproducing our experiments and pretrained models is available atthis https URL.

View on arXiv
@article{agrawal2025_2503.12355,
  title={ Atlas: Multi-Scale Attention Improves Long Context Image Modeling },
  author={ Kumar Krishna Agrawal and Long Lian and Longchao Liu and Natalia Harguindeguy and Boyi Li and Alexander Bick and Maggie Chung and Trevor Darrell and Adam Yala },
  journal={arXiv preprint arXiv:2503.12355},
  year={ 2025 }
}
Comments on this paper