ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.02308
30
42

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

4 March 2024
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
    ViT
ArXivPDFHTML
Abstract

Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces Vision-RWKV (VRWKV), a model adapted from the RWKV model used in the NLP field with necessary modifications for vision tasks. Similar to the Vision Transformer (ViT), our model is designed to efficiently handle sparse inputs and demonstrate robust global processing capabilities, while also scaling up effectively, accommodating both large-scale parameters and extensive datasets. Its distinctive advantage lies in its reduced spatial aggregation complexity, which renders it exceptionally adept at processing high-resolution images seamlessly, eliminating the necessity for windowing operations. Our evaluations demonstrate that VRWKV surpasses ViT's performance in image classification and has significantly faster speeds and lower memory usage processing high-resolution inputs. In dense prediction tasks, it outperforms window-based models, maintaining comparable speeds. These results highlight VRWKV's potential as a more efficient alternative for visual perception tasks. Code is released atthis https URL.

View on arXiv
@article{duan2025_2403.02308,
  title={ Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures },
  author={ Yuchen Duan and Weiyun Wang and Zhe Chen and Xizhou Zhu and Lewei Lu and Tong Lu and Yu Qiao and Hongsheng Li and Jifeng Dai and Wenhai Wang },
  journal={arXiv preprint arXiv:2403.02308},
  year={ 2025 }
}
Comments on this paper