ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.00808
  4. Cited By
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

2 January 2023
Sanghyun Woo
Shoubhik Debnath
Ronghang Hu
Xinlei Chen
Zhuang Liu
In So Kweon
Saining Xie
    SyDa
ArXivPDFHTML

Papers citing "ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders"

50 / 325 papers shown
Title
A Simple Detector with Frame Dynamics is a Strong Tracker
A Simple Detector with Frame Dynamics is a Strong Tracker
Chenxu Peng
C. Wang
Minrui Zou
Danyang Li
Z. Yang
Yimian Dai
Ming-Ming Cheng
Xiang Li
42
0
0
08 May 2025
ORXE: Orchestrating Experts for Dynamically Configurable Efficiency
ORXE: Orchestrating Experts for Dynamically Configurable Efficiency
Qingyuan Wang
Guoxin Wang
B. Cardiff
Deepu John
38
0
0
07 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I Roumeliotis
Manoj Karkee
LM&Ro
128
1
0
07 May 2025
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
Hyb-KAN ViT: Hybrid Kolmogorov-Arnold Networks Augmented Vision Transformer
Sainath Dey
Mitul Goswami
Jashika Sethi
Prasant Kumar Pattnaik
ViT
28
0
0
07 May 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
48
0
0
29 Apr 2025
A BERT-Style Self-Supervised Learning CNN for Disease Identification from Retinal Images
A BERT-Style Self-Supervised Learning CNN for Disease Identification from Retinal Images
Xin Li
Wenhui Zhu
Peijie Qiu
Oana Dumitrascu
Amal Youssef
Y. Wang
SSL
MedIm
89
0
0
25 Apr 2025
MSAD-Net: Multiscale and Spatial Attention-based Dense Network for Lung Cancer Classification
MSAD-Net: Multiscale and Spatial Attention-based Dense Network for Lung Cancer Classification
Santanu Roy
Shweta Singh
Palak Sahu
Ashvath Suresh
Debashish Das
30
0
0
20 Apr 2025
DAM-Net: Domain Adaptation Network with Micro-Labeled Fine-Tuning for Change Detection
DAM-Net: Domain Adaptation Network with Micro-Labeled Fine-Tuning for Change Detection
H. Chen
Xin Xu
Fangling Pu
30
0
0
18 Apr 2025
Learning from Noisy Pseudo-labels for All-Weather Land Cover Mapping
Learning from Noisy Pseudo-labels for All-Weather Land Cover Mapping
Wang Liu
Zhiyu Wang
Xin Guo
Puhong Duan
Xudong Kang
Shutao Li
22
0
0
18 Apr 2025
CoMotion: Concurrent Multi-person 3D Motion
CoMotion: Concurrent Multi-person 3D Motion
Alejandro Newell
Peiyun Hu
Lahav Lipson
Stephan R. Richter
V. Koltun
3DH
VOT
69
0
0
16 Apr 2025
Real-World Depth Recovery via Structure Uncertainty Modeling and Inaccurate GT Depth Fitting
Real-World Depth Recovery via Structure Uncertainty Modeling and Inaccurate GT Depth Fitting
Delong Suzhang
Meng Yang
24
0
0
16 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
S. Liu
J. Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
23
0
0
14 Apr 2025
Novel Pooling-based VGG-Lite for Pneumonia and Covid-19 Detection from Imbalanced Chest X-Ray Datasets
Novel Pooling-based VGG-Lite for Pneumonia and Covid-19 Detection from Imbalanced Chest X-Ray Datasets
Santanu Roy
Ashvath Suresh
Palak Sahu
Tulika Rudra Gupta
29
0
0
10 Apr 2025
Attributes-aware Visual Emotion Representation Learning
Attributes-aware Visual Emotion Representation Learning
R. S. Maharjan
Marta Romeo
Angelo Cangelosi
30
0
0
09 Apr 2025
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification
Jiahang Li
Shibo Xue
Yong Su
28
0
0
08 Apr 2025
Reinforced Multi-teacher Knowledge Distillation for Efficient General Image Forgery Detection and Localization
Reinforced Multi-teacher Knowledge Distillation for Efficient General Image Forgery Detection and Localization
Zeqin Yu
Jiangqun Ni
Jian Zhang
Haoyi Deng
Yuzhen Lin
23
0
0
07 Apr 2025
APSeg: Auto-Prompt Model with Acquired and Injected Knowledge for Nuclear Instance Segmentation and Classification
APSeg: Auto-Prompt Model with Acquired and Injected Knowledge for Nuclear Instance Segmentation and Classification
Liying Xu
Hongliang He
Wei Han
Hanbin Huang
Siwei Feng
Guohong Fu
VLM
62
0
0
03 Apr 2025
OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
Zhongjian Wang
Peng Zhang
Jinwei Qi
Guangyuan Wang Sheng Xu
Bang Zhang
Liefeng Bo
DiffM
VGen
36
0
0
03 Apr 2025
Scaling Language-Free Visual Representation Learning
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
58
2
0
01 Apr 2025
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations
Chongjie Si
Zhiyi Shi
Xuehui Wang
Yichen Xiao
Xiaokang Yang
Wei-Ming Shen
AI4CE
60
0
0
01 Apr 2025
Self-Supervised Pretraining for Aerial Road Extraction
Self-Supervised Pretraining for Aerial Road Extraction
Rupert Polley
Sai Vignesh Abishek Deenadayalan
Johann Marius Zöllner
SSL
68
0
0
31 Mar 2025
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
VGen
67
1
0
31 Mar 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Chenkai Zhang
Yiming Lei
Zeming Liu
Qingjie Liu
Y. Wang
44
0
0
28 Mar 2025
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Haomin Zhang
Chang Liu
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
83
0
0
28 Mar 2025
An improved EfficientNetV2 for garbage classification
An improved EfficientNetV2 for garbage classification
Wenxuan Qiu
Chengxin Xie
Jingui Huang
40
0
0
27 Mar 2025
A Spatial-temporal Deep Probabilistic Diffusion Model for Reliable Hail Nowcasting with Radar Echo Extrapolation
A Spatial-temporal Deep Probabilistic Diffusion Model for Reliable Hail Nowcasting with Radar Echo Extrapolation
Haonan Shi
Long Tian
Jie Tao
Yufei Li
Liming Wang
Xiyang Liu
AI4Cl
30
0
0
26 Mar 2025
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
Tao Feng
Zhiyuan Zhao
Yifan Xie
Yuqi Ye
Xiangyang Luo
Xun Guan
Y. Li
55
0
0
21 Mar 2025
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Robin Hesse
Doğukan Bağcı
Bernt Schiele
Simone Schaub-Meyer
Stefan Roth
VLM
57
0
0
21 Mar 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
48
1
0
21 Mar 2025
Depth-Aware Range Image-Based Model for Point Cloud Segmentation
Depth-Aware Range Image-Based Model for Point Cloud Segmentation
Bike Chen
Antti Tikänmaki
Juha Roning
3DPC
3DV
47
0
0
19 Mar 2025
Fibonacci-Net: A Lightweight CNN model for Automatic Brain Tumor Classification
Fibonacci-Net: A Lightweight CNN model for Automatic Brain Tumor Classification
Santanu Roy
Ashvath Suresh
Archit Gupta
Shubhi Tiwari
Palak Sahu
Prashant Adhikari
Yuvraj S. Shekhawat
48
0
0
18 Mar 2025
AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis
AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis
Hadam Baek
Hannie Shin
Jiyoung Seo
Chanwoo Kim
Saerom Kim
Hyeongbok Kim
Sangpil Kim
41
0
0
17 Mar 2025
8-Calves Image dataset
8-Calves Image dataset
Xuyang Fang
S. Hannuna
Neill D. F. Campbell
110
0
0
17 Mar 2025
Unlocking Open-Set Language Accessibility in Vision Models
Fawaz Sammani
Jonas Fischer
Nikos Deligiannis
VLM
53
0
0
14 Mar 2025
Solution for 8th Competition on Affective & Behavior Analysis in-the-wild
Jun-chen Yu
Yunxiang Zhang
Xilong Lu
Yang Zheng
Yongqi Wang
Lingsi Zhu
44
0
0
14 Mar 2025
CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition
Kaixiang Yang
Xin Li
Qiang Li
Zhiwei Wang
48
0
0
13 Mar 2025
Context-guided Responsible Data Augmentation with Diffusion Models
Khawar Islam
Naveed Akhtar
46
1
0
12 Mar 2025
A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization
Md Yousuf Harun
Christopher Kanan
AI4CE
48
0
0
09 Mar 2025
Segment Anything, Even Occluded
Wei-En Tai
Yu-Lin Shih
Cheng Sun
Y. Wang
Hwann-Tzong Chen
VLM
62
0
0
08 Mar 2025
Automatic Drywall Analysis for Progress Tracking and Quality Control in Construction
Mariusz Trzeciakiewicz
Aleixo Cambeiro Barreiro
Niklas Gard
A. Hilsmann
Peter Eisert
45
0
0
05 Mar 2025
Partial Convolution Meets Visual Attention
Haiduo Huang
Fuwei Yang
D. Li
Ji Liu
Lu Tian
Jinzhang Peng
Pengju Ren
E. Barsoum
3DH
151
0
0
05 Mar 2025
JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
Xiaoyong Lu
Songlin Du
Mamba
68
0
0
05 Mar 2025
Is Pre-training Applicable to the Decoder for Dense Prediction?
Is Pre-training Applicable to the Decoder for Dense Prediction?
Chao Ning
Wanshui Gan
Weihao Xuan
Naoto Yokoya
48
0
0
05 Mar 2025
Unsupervised Waste Classification By Dual-Encoder Contrastive Learning and Multi-Clustering Voting (DECMCV)
Kui Huang
Mengke Song
Shuo Ba
Ling An
Huajie Liang
Huanxi Deng
Yang Liu
Zhenyu Zhang
Chichun Zhou
48
0
0
04 Mar 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu
Huan Zhang
AAML
72
0
0
25 Feb 2025
MaxGlaViT: A novel lightweight vision transformer-based approach for early diagnosis of glaucoma stages from fundus images
MaxGlaViT: A novel lightweight vision transformer-based approach for early diagnosis of glaucoma stages from fundus images
Mustafa Yurdakul
Kubra Uyar
Şakir Taşdemir
53
1
0
24 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
67
0
0
24 Feb 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
53
0
0
23 Feb 2025
CFIRSTNET: Comprehensive Features for Static IR Drop Estimation with Neural Network
CFIRSTNET: Comprehensive Features for Static IR Drop Estimation with Neural Network
Yu-Tung Liu
Yu-Hao Cheng
Shao-Yu Wu
Hung-Ming Chen
58
0
0
13 Feb 2025
Do we really have to filter out random noise in pre-training data for language models?
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Yuexian Zou
83
2
0
10 Feb 2025
1234567
Next