ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.07313
28
23

Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation

14 August 2023
Huan Liu
Qiang Chen
Zichang Tan
Jiangjiang Liu
Jian Wang
Xiangbo Su
Xiaolong Li
Kun Yao
Junyu Han
Errui Ding
Yao-Min Zhao
Jingdong Wang
    ViT
ArXivPDFHTML
Abstract

In this paper, we study the problem of end-to-end multi-person pose estimation. State-of-the-art solutions adopt the DETR-like framework, and mainly develop the complex decoder, e.g., regarding pose estimation as keypoint box detection and combining with human detection in ED-Pose, hierarchically predicting with pose decoder and joint (keypoint) decoder in PETR. We present a simple yet effective transformer approach, named Group Pose. We simply regard KKK-keypoint pose estimation as predicting a set of N×KN\times KN×K keypoint positions, each from a keypoint query, as well as representing each pose with an instance query for scoring NNN pose predictions. Motivated by the intuition that the interaction, among across-instance queries of different types, is not directly helpful, we make a simple modification to decoder self-attention. We replace single self-attention over all the N×(K+1)N\times(K+1)N×(K+1) queries with two subsequent group self-attentions: (i) NNN within-instance self-attention, with each over KKK keypoint queries and one instance query, and (ii) (K+1)(K+1)(K+1) same-type across-instance self-attention, each over NNN queries of the same type. The resulting decoder removes the interaction among across-instance type-different queries, easing the optimization and thus improving the performance. Experimental results on MS COCO and CrowdPose show that our approach without human box supervision is superior to previous methods with complex decoders, and even is slightly better than ED-Pose that uses human box supervision. \href\href{https://github.com/Michel-liu/GroupPose-Paddle}{\rm Paddle}\href and \href\href{https://github.com/Michel-liu/GroupPose}{\rm PyTorch}\href code are available.

View on arXiv
Comments on this paper