Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Qiang Chen
Jian Wang
Chuchu Han
Shangang Zhang
Zexian Li
Xiaokang Chen
Jiahui Chen
Xiaodi Wang
Shumin Han
Gang Zhang
Haocheng Feng
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang

Abstract
We present a strong object detector with encoder-decoder pretraining and finetuning. Our method, called Group DETR v2, is built upon a vision transformer encoder ViT-Huge~\cite{dosovitskiy2020image}, a DETR variant DINO~\cite{zhang2022dino}, and an efficient DETR training method Group DETR~\cite{chen2022group}. The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO. Group DETR v2 achieves mAP on COCO test-dev, and establishes a new SoTA on the COCO leaderboard https://paperswithcode.com/sota/object-detection-on-coco
View on arXivComments on this paper