ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.02605
  4. Cited By
Detecting Twenty-thousand Classes using Image-level Supervision

Detecting Twenty-thousand Classes using Image-level Supervision

7 January 2022
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Detecting Twenty-thousand Classes using Image-level Supervision"

50 / 103 papers shown
Title
EgoPCA: A New Framework for Egocentric Hand-Object Interaction
  Understanding
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
18
9
0
05 Sep 2023
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang
Xiaoyang Wu
Xi Chen
Hengshuang Zhao
Lei Zhu
Joan Lasenby
ISeg
3DPC
VLM
39
46
0
01 Sep 2023
Mobile Foundation Model as Firmware
Mobile Foundation Model as Firmware
Jinliang Yuan
Chenchen Yang
Dongqi Cai
Shihe Wang
Xin Yuan
...
Di Zhang
Hanzi Mei
Xianqing Jia
Shangguang Wang
Mengwei Xu
32
19
0
28 Aug 2023
Masked Feature Modelling: Feature Masking for the Unsupervised
  Pre-training of a Graph Attention Network Block for Bottom-up Video Event
  Recognition
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
32
0
0
24 Aug 2023
Structured World Models from Human Videos
Structured World Models from Human Videos
Russell Mendonca
Shikhar Bahl
Deepak Pathak
LM&Ro
21
85
0
21 Aug 2023
An Examination of the Compositionality of Large Generative
  Vision-Language Models
An Examination of the Compositionality of Large Generative Vision-Language Models
Teli Ma
Rong Li
Junwei Liang
CoGe
19
2
0
21 Aug 2023
ARGUS: Visualization of AI-Assisted Task Guidance in AR
ARGUS: Visualization of AI-Assisted Task Guidance in AR
Sonia Castelo
Joao Rulff
Erin McGowan
Bea Steers
Guande Wu
...
Qinghong Sun
Huy Q. Vo
J. P. Bello
M. Krone
Claudio Silva
29
18
0
11 Aug 2023
Foundation Model based Open Vocabulary Task Planning and Executive
  System for General Purpose Service Robots
Foundation Model based Open Vocabulary Task Planning and Executive System for General Purpose Service Robots
Yoshiki Obinata
Naoaki Kanazawa
Kento Kawaharazuka
Iori Yanokura
Soon-Hyeob Kim
K. Okada
Masayuki Inaba
LM&Ro
19
7
0
07 Aug 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
83
223
0
07 Jul 2023
How can objects help action recognition?
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
30
14
0
20 Jun 2023
HomeRobot: Open-Vocabulary Mobile Manipulation
HomeRobot: Open-Vocabulary Mobile Manipulation
Sriram Yenamandra
A. Ramachandran
Karmesh Yadav
Austin S. Wang
Mukul Khanna
...
Devendra Singh Chaplot
Dhruv Batra
Roozbeh Mottaghi
Yonatan Bisk
Chris Paxton
LM&Ro
33
79
0
20 Jun 2023
GeneCIS: A Benchmark for General Conditional Image Similarity
GeneCIS: A Benchmark for General Conditional Image Similarity
S. Vaze
Nicolas Carion
Ishan Misra
VLM
DiffM
22
26
0
13 Jun 2023
Multi-modal Queried Object Detection in the Wild
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
26
29
0
30 May 2023
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Zhangxuan Gu
Zhuoer Xu
Haoxing Chen
Jun Lan
Changhua Meng
Weiqiang Wang
14
4
0
16 May 2023
Affordances from Human Videos as a Versatile Representation for Robotics
Affordances from Human Videos as a Versatile Representation for Robotics
Shikhar Bahl
Russell Mendonca
Lili Chen
Unnat Jain
Deepak Pathak
30
160
0
17 Apr 2023
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary
  Visual Recognition
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Shuhuai Ren
Aston Zhang
Yi Zhu
Shuai Zhang
Shuai Zheng
Mu Li
Alexander J. Smola
Xu Sun
VPVLM
VLM
17
28
0
10 Apr 2023
V3Det: Vast Vocabulary Visual Detection Dataset
V3Det: Vast Vocabulary Visual Detection Dataset
Jiaqi Wang
Pan Zhang
Tao Chu
Yuhang Cao
Yujie Zhou
Tong Wu
Bin Wang
Conghui He
Dahua Lin
VLM
ObjD
18
51
0
07 Apr 2023
Navigating to Objects Specified by Images
Navigating to Objects Specified by Images
Jacob Krantz
Théophile Gervet
Karmesh Yadav
Austin S. Wang
Chris Paxton
Roozbeh Mottaghi
Dhruv Batra
Jitendra Malik
Stefan Lee
Devendra Singh Chaplot
32
36
0
03 Apr 2023
RegionPLC: Regional Point-Language Contrastive Learning for Open-World
  3D Scene Understanding
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Jihan Yang
Runyu Ding
Weipeng Deng
Zhe Wang
Xiaojuan Qi
10
61
0
03 Apr 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
23
23
0
29 Mar 2023
Prompt-Guided Transformers for End-to-End Open-Vocabulary Object
  Detection
Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection
Hwanjun Song
Jihwan Bang
VLM
ObjD
18
14
0
25 Mar 2023
Learned Two-Plane Perspective Prior based Image Resampling for Efficient
  Object Detection
Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
Anurag Ghosh
Dinesh Reddy Narapureddy
Christoph Mertz
S. Narasimhan
23
4
0
25 Mar 2023
Three ways to improve feature alignment for open vocabulary detection
Three ways to improve feature alignment for open vocabulary detection
Relja Arandjelović
A. Andonian
A. Mensch
Olivier J. Hénaff
Jean-Baptiste Alayrac
Andrew Zisserman
VLM
ObjD
28
19
0
23 Mar 2023
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for
  Weakly-Supervised Object Detection
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection
Arushi Rai
Adriana Kovashka
19
0
0
16 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
19
1,804
0
09 Mar 2023
OvarNet: Towards Open-vocabulary Object Attribute Recognition
OvarNet: Towards Open-vocabulary Object Attribute Recognition
Keyan Chen
Xiaolong Jiang
Yao Hu
Xu Tang
Yan Gao
Jianqi Chen
Weidi Xie
VLM
ObjD
27
40
0
23 Jan 2023
Betrayed by Captions: Joint Caption Grounding and Generation for Open
  Vocabulary Instance Segmentation
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
Jianzong Wu
Xiangtai Li
Henghui Ding
Xia Li
Guangliang Cheng
Yu Tong
Chen Change Loy
VLM
80
31
0
02 Jan 2023
Fine-tuned CLIP Models are Efficient Video Learners
Fine-tuned CLIP Models are Efficient Video Learners
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
F. Khan
CLIP
VLM
17
148
0
06 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
13
2
0
06 Dec 2022
Navigating to Objects in the Real World
Navigating to Objects in the Real World
Théophile Gervet
Soumith Chintala
Dhruv Batra
Jitendra Malik
Devendra Singh Chaplot
27
122
0
02 Dec 2022
In-Hand 3D Object Scanning from an RGB Sequence
In-Hand 3D Object Scanning from an RGB Sequence
Shreyas Hampali
Tomás Hodan
Luan Tran
Lingni Ma
Cem Keskin
Vincent Lepetit
3DH
11
20
0
28 Nov 2022
ReCo: Region-Controlled Text-to-Image Generation
ReCo: Region-Controlled Text-to-Image Generation
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Linjie Li
Kevin Qinghong Lin
...
Nan Duan
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
DiffM
26
140
0
23 Nov 2022
Open-vocabulary Attribute Detection
Open-vocabulary Attribute Detection
M. A. Bravo
Sudhanshu Mittal
Simon Ging
Thomas Brox
VLM
ObjD
14
30
0
23 Nov 2022
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion
  Models
Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
Muyang Li
Ji Lin
Chenlin Meng
Stefano Ermon
Song Han
Jun-Yan Zhu
DiffM
25
45
0
03 Nov 2022
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary
  Object Detection
Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection
Yanxin Long
Jianhua Han
Runhu Huang
Xu Hang
Yi Zhu
Chunjing Xu
Xiaodan Liang
VLM
ObjD
19
18
0
02 Nov 2022
MaPLe: Multi-modal Prompt Learning
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
F. Khan
VPVLM
VLM
186
528
0
06 Oct 2022
Generative Category-Level Shape and Pose Estimation with Semantic
  Primitives
Generative Category-Level Shape and Pose Estimation with Semantic Primitives
Guanglin Li
Yifeng Li
Zhichao Ye
Qihang Zhang
Tao Kong
Zhaopeng Cui
Guofeng Zhang
32
24
0
03 Oct 2022
UAV-based Visual Remote Sensing for Automated Building Inspection
UAV-based Visual Remote Sensing for Automated Building Inspection
Kushagra Srivastava
Dhruva G. Patel
Aditya Kumar Jha
Mohhit Kumar Jha
Jaskirat Singh
Ravi Kiran Sarvadevabhatla
P. Ramancharla
Harikumar Kandath
K. M. Krishna
9
1
0
27 Sep 2022
Bridging the Gap between Object and Image-level Representations for
  Open-Vocabulary Detection
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
H. Rasheed
Muhammad Maaz
Muhammad Uzair Khattak
Salman Khan
F. Khan
ObjD
VLM
23
151
0
07 Jul 2022
Open-Vocabulary 3D Detection via Image-level Class and Debiased
  Cross-modal Contrastive Learning
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning
Yuheng Lu
Chenfeng Xu
Xi Wei
Xiaodong Xie
M. Tomizuka
Kurt Keutzer
Shanghang Zhang
3DPC
13
20
0
05 Jul 2022
Can Language Understand Depth?
Can Language Understand Depth?
Renrui Zhang
Ziyao Zeng
Ziyu Guo
Yafeng Li
VLM
MDE
13
71
0
03 Jul 2022
Open Vocabulary Object Detection with Proposal Mining and Prediction
  Equalization
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Peixian Chen
Kekai Sheng
Mengdan Zhang
Mingbao Lin
Yunhang Shen
Shaohui Lin
Bo Ren
Ke Li
VLM
ObjD
23
27
0
22 Jun 2022
Zero-shot object goal visual navigation
Zero-shot object goal visual navigation
Qianfan Zhao
Lu Zhang
Bin He
Hong Qiao
Zhi-yong Liu
17
37
0
15 Jun 2022
INDIGO: Intrinsic Multimodality for Domain Generalization
INDIGO: Intrinsic Multimodality for Domain Generalization
Puneet Mangla
Shivam Chandhok
Milan Aggarwal
V. Balasubramanian
Balaji Krishnamurthy
VLM
28
2
0
13 Jun 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Xiao Wang
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
8
304
0
12 May 2022
Large-scale Bilingual Language-Image Contrastive Learning
Large-scale Bilingual Language-Image Contrastive Learning
ByungSoo Ko
Geonmo Gu
VLM
17
14
0
28 Mar 2022
Open-Vocabulary DETR with Conditional Matching
Open-Vocabulary DETR with Conditional Matching
Yuhang Zang
Wei Li
Kaiyang Zhou
Chen Huang
Chen Change Loy
ObjD
VLM
9
197
0
22 Mar 2022
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
223
897
0
28 Apr 2021
ImageNet-21K Pretraining for the Masses
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
166
684
0
22 Apr 2021
Simple multi-dataset detection
Simple multi-dataset detection
Xingyi Zhou
V. Koltun
Philipp Krahenbuhl
ObjD
228
112
0
25 Feb 2021
Previous
123
Next