ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.03459
29
5

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

5 June 2024
Qiang Chen
Xiangbo Su
Xinyu Zhang
Jian Wang
Jiahui Chen
Yunpeng Shen
Chuchu Han
Ziliang Chen
Weixiang Xu
Fanrong Li
Shan Zhang
Kun Yao
Errui Ding
Gang Zhang
Jingdong Wang
    ViT
ArXivPDFHTML
Abstract

In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for reducing the ViT encoder complexity. We improve the ViT encoder by aggregating multi-level feature maps, and the intermediate and final feature maps in the ViT encoder, forming richer feature maps, and introduce window-major feature map organization for improving the efficiency of interleaved attention computation. Experimental results demonstrate that the proposed approach is superior over existing real-time detectors, e.g., YOLO and its variants, on COCO and other benchmark datasets. Code and models are available at (https://github.com/Atten4Vis/LW-DETR).

View on arXiv
Comments on this paper