ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.11094
  4. Cited By
GroupViT: Semantic Segmentation Emerges from Text Supervision

GroupViT: Semantic Segmentation Emerges from Text Supervision

22 February 2022
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
X. Wang
    ViT
    VLM
ArXivPDFHTML

Papers citing "GroupViT: Semantic Segmentation Emerges from Text Supervision"

11 / 11 papers shown
Title
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
Xiaofeng Jin
Matteo Frosi
Matteo Matteucci
27
26
0
27 Apr 2025
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight
  Transformer
Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer
Zhihe Lu
Sen He
Xiatian Zhu
Li Zhang
Yi-Zhe Song
Tao Xiang
ViT
145
146
0
06 Aug 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
219
2,132
0
04 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
260
4,299
0
29 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
247
2,898
0
24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
257
845
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
268
2,875
0
11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
264
1,486
0
09 Feb 2021
DEAL: Difficulty-aware Active Learning for Semantic Segmentation
DEAL: Difficulty-aware Active Learning for Semantic Segmentation
Shuai Xie
Zunlei Feng
Ying Chen
Songtao Sun
Chao Ma
Mingli Song
VLM
99
43
0
17 Oct 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
234
815
0
24 Sep 2019
Learning Pixel-level Semantic Affinity with Image-level Supervision for
  Weakly Supervised Semantic Segmentation
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation
Jiwoon Ahn
Suha Kwak
200
678
0
28 Mar 2018
1