Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1405.0312
Cited By
Microsoft COCO: Common Objects in Context
1 May 2014
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Microsoft COCO: Common Objects in Context"
50 / 652 papers shown
Title
GazeHTA: End-to-end Gaze Target Detection with Head-Target Association
Zhi-Yi Lin
Jouh Yeong Chew
Jan van Gemert
Xucong Zhang
93
3
0
16 Apr 2024
RankCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zhili Feng
Zenghui Ding
Yining Sun
SSL
VLM
78
7
0
15 Apr 2024
Flatness Improves Backbone Generalisation in Few-shot Classification
Rui Li
Martin Trapp
Talal Alrawajfeh
Arno Solin
85
0
0
11 Apr 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
109
17
0
07 Apr 2024
Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View
Andreea Dogaru
M. Ozer
Bernhard Egger
3DGS
75
5
0
04 Apr 2024
Semi-Supervised Unconstrained Head Pose Estimation in the Wild
Huayi Zhou
Fei Jiang
Hongtao Lu
Yong Rui
Hongtao Lu
Kui Jia
89
0
0
03 Apr 2024
Faster Diffusion via Temporal Attention Decomposition
Haozhe Liu
Wentian Zhang
Jinheng Xie
Francesco Faccio
Mengmeng Xu
Tao Xiang
Mike Zheng Shou
Juan-Manuel Perez-Rua
Jürgen Schmidhuber
DiffM
102
21
0
03 Apr 2024
Gyro-based Neural Single Image Deblurring
Heemin Yang
Jaesung Rim
Seungyong Lee
Seung-Hwan Baek
Sunghyun Cho
43
1
0
01 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
119
36
0
29 Mar 2024
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
73
2
0
26 Mar 2024
Exploring Dynamic Transformer for Efficient Object Tracking
Jiawen Zhu
Xin Chen
Haiwen Diao
Shuai Li
Jun-Yan He
Chenyang Li
Bin Luo
Dong Wang
Huchuan Lu
88
2
0
26 Mar 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
115
11
0
25 Mar 2024
FOOL: Addressing the Downlink Bottleneck in Satellite Computing with Neural Feature Compression
Alireza Furutanpey
Qiyang Zhang
Philipp Raith
Tobias Pfandzelter
Shangguang Wang
Schahram Dustdar
113
4
0
25 Mar 2024
Multiple Object Tracking as ID Prediction
Ruopeng Gao
Yijun Zhang
Limin Wang
97
13
0
25 Mar 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
64
3
0
25 Mar 2024
Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People
Ricardo E Gonzalez Penuela
Jazmin Collins
Shiri Azenkot
Cynthia L. Bennett
55
25
0
22 Mar 2024
GazeFusion: Saliency-Guided Image Generation
Yunxiang Zhang
Nan Wu
Connor Z. Lin
Gordon Wetzstein
Qi Sun
57
0
0
16 Mar 2024
Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Jin-Young Kim
Hyojun Go
Soonwoo Kwon
Hyun-Gyoon Kim
DiffM
101
6
0
15 Mar 2024
Explore In-Context Segmentation via Latent Diffusion Models
Chaoyang Wang
Xiangtai Li
Henghui Ding
Lu Qi
Jiangning Zhang
Yunhai Tong
Chen Change Loy
Shuicheng Yan
DiffM
95
6
0
14 Mar 2024
A Bayesian Approach to OOD Robustness in Image Classification
Prakhar Kaushik
Adam Kortylewski
Alan Yuille
46
2
0
12 Mar 2024
AS-FIBA: Adaptive Selective Frequency-Injection for Backdoor Attack on Deep Face Restoration
Zhenbo Song
Wenhao Gao
Kaihao Zhang
Wenhan Luo
AAML
65
0
0
11 Mar 2024
LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations
Mohammad Alkhalefi
Georgios Leontidis
Mingjun Zhong
145
3
0
11 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
90
12
0
05 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
54
44
0
04 Mar 2024
SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
Hongjian Liu
Qingsong Xie
Zhijie Deng
Chen Chen
Shixiang Tang
Fueyang Fu
Zheng-Jun Zha
H. Lu
Zheng-jun Zha
65
6
0
03 Mar 2024
Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization
Han Guo
Ramtin Hosseini
Ruiyi Zhang
Sai Ashish Somayajula
Ranak Roy Chowdhury
Rajesh K. Gupta
Pengtao Xie
63
0
0
28 Feb 2024
Diffusion Model-Based Image Editing: A Survey
Yi Huang
Jiancheng Huang
Yifan Liu
Mingfu Yan
Jiaxi Lv
Jianzhuang Liu
Wei Xiong
He Zhang
Liangliang Cao
Liangliang Cao
EGVM
96
90
0
27 Feb 2024
Subobject-level Image Tokenization
Delong Chen
Samuel Cahyawijaya
Jianfeng Liu
Baoyuan Wang
Pascale Fung
VLM
OCL
148
9
0
22 Feb 2024
Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Tanzila Rahman
Shweta Mahajan
Hsin-Ying Lee
Jian Ren
Sergey Tulyakov
Leonid Sigal
101
4
0
18 Feb 2024
Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training
Huayi Zhou
Mukun Luo
Fei Jiang
Yue Ding
Hongtao Lu
Kui Jia
65
0
0
18 Feb 2024
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
91
4
0
17 Feb 2024
CIC: A Framework for Culturally-Aware Image Captioning
Youngsik Yun
Jihie Kim
VLM
67
6
0
08 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
151
112
0
08 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
102
5
0
08 Feb 2024
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
92
1
0
06 Feb 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
89
79
0
03 Feb 2024
Segment Any Change
Zhuo Zheng
Yanfei Zhong
Liangpei Zhang
Stefano Ermon
VLM
44
12
0
02 Feb 2024
Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations
Bhishma Dedhia
N. Jha
OCL
82
1
0
02 Feb 2024
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
60
14
0
25 Jan 2024
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
Ci-Siang Lin
Chien-Yi Wang
Yu-Chiang Frank Wang
Min-Hung Chen
VLM
118
0
0
22 Jan 2024
LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning
Ye Lin Tun
Chu Myaet Thwal
Le Quang Huy
Minh N. H. Nguyen
Choong Seon Hong
FedML
62
2
0
22 Jan 2024
Out-of-Distribution Detection & Applications With Ablated Learned Temperature Energy
Will LeVine
Benjamin Pikus
Jacob Phillips
Berk Norman
Fernando Amat Gil
Sean Hendryx
OODD
111
1
0
22 Jan 2024
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Chu Myaet Thwal
Minh N. H. Nguyen
Ye Lin Tun
Seongjin Kim
My T. Thai
Choong Seon Hong
76
5
0
22 Jan 2024
Idempotence and Perceptual Image Compression
Tongda Xu
Ziran Zhu
Dailan He
Yanghao Li
Lina Guo
...
Zhe Wang
Hongwei Qin
Yan Wang
Jingjing Liu
Ya Zhang
61
16
0
17 Jan 2024
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Haopeng Li
Andong Deng
Qiuhong Ke
Jun Liu
Hossein Rahmani
Yulan Guo
Mohammed Bennamoun
Chen Chen
78
17
0
03 Jan 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
169
4
0
28 Dec 2023
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
Xinyan Chen
Jiaxin Ge
Tianjun Zhang
Jiaming Liu
Shanghang Zhang
VLM
EGVM
81
0
0
23 Dec 2023
Fast Sampling Through The Reuse Of Attention Maps In Diffusion Models
Rosco Hunter
Łukasz Dudziak
Mohamed S. Abdelfattah
Abhinav Mehrotra
Sourav Bhattacharya
Hongkai Wen
29
1
0
13 Dec 2023
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
M. S. Seyfioglu
Wisdom O. Ikezogwo
Fatemeh Ghezloo
Ranjay Krishna
Linda G. Shapiro
87
43
0
07 Dec 2023
Hiding Functions within Functions: Steganography by Implicit Neural Representations
Jia-Wei Liu
Peng Luo
Yan Ke
Dang Qian
Zhang Minqing
Mu Dejun
GAN
72
4
0
07 Dec 2023
Previous
1
2
3
...
10
11
12
13
14
Next