Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1405.0312
Cited By
Microsoft COCO: Common Objects in Context
1 May 2014
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Microsoft COCO: Common Objects in Context"
50 / 652 papers shown
Title
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
Hao Fei
144
14
0
22 Dec 2024
Visual Prompting with Iterative Refinement for Design Critique Generation
Peitong Duan
Chin-Yi Cheng
Bjoern Hartmann
Yang Li
120
0
0
22 Dec 2024
Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
Tongfei Bian
Yiming Ma
Mathieu Chollet
Victor Sanchez
T. Guha
EgoV
124
1
0
21 Dec 2024
Enhancing Contrastive Learning Inspired by the Philosophy of "The Blind Men and the Elephant"
Yudong Zhang
Ruobing Xie
Jiansheng Chen
Xingwu Sun
Zhanhui Kang
Yu Wang
137
0
0
21 Dec 2024
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
156
2
0
20 Dec 2024
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Yue Zhang
Liqiang Jing
Vibhav Gogate
138
3
0
19 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
102
0
0
18 Dec 2024
Zero-Shot Low Light Image Enhancement with Diffusion Prior
Joshua Cho
Sara Aghajanzadeh
Zhen Zhu
David A. Forsyth
DiffM
174
0
0
18 Dec 2024
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang
Ziwei Zheng
Boxu Chen
Zhengyu Zhao
Chenhao Lin
Chao Shen
VLM
178
4
0
18 Dec 2024
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Siyuan Wang
Dianyi Wang
Chengxing Zhou
Zejun Li
Zhihao Fan
Xuanjing Huang
Zhongyu Wei
VLM
393
0
0
17 Dec 2024
CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution
Bingwen Hu
Heng Liu
Zhedong Zheng
Ping Liu
SupR
159
0
0
16 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
Mingxing Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
124
5
0
08 Dec 2024
DEIM: DETR with Improved Matching for Fast Convergence
Shihua Huang
Zhichao Lu
Xiaodong Cun
Yongjun Yu
Xiao Zhou
Xi Shen
VLM
409
3
0
05 Dec 2024
Frequency-Adaptive Low-Latency Object Detection Using Events and Frames
Haitian Zhang
Xiangyuan Wang
Chang Xu
Xinya Wang
Fang Xu
Huai Yu
Lei Yu
Wen Yang
ObjD
106
0
0
05 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
105
0
0
05 Dec 2024
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
Jinbin Bai
Wei Chow
L. Yang
Hefei Ling
Juncheng Billy Li
Hao Zhang
Shuicheng Yan
127
6
0
05 Dec 2024
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
Andreas Müller
Denis Lukovnikov
Jonas Thietke
Asja Fischer
Erwin Quiring
AAML
WIGM
343
5
0
04 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
126
0
0
04 Dec 2024
Beyond [cls]: Exploring the true potential of Masked Image Modeling representations
Marcin Przewiȩźlikowski
Randall Balestriero
Wojciech Jasiński
Marek 'Smieja
Bartosz Zieliñski
125
0
0
04 Dec 2024
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
Khaled Abud
Sergey Lavrushkin
Alexey Kirillov
D. Vatolin
130
0
0
02 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
195
2
0
02 Dec 2024
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
127
5
0
02 Dec 2024
HandOS: 3D Hand Reconstruction in One Stage
Xingyu Chen
Zhuheng Song
Xiaoke Jiang
Yaoqing Hu
Junzhi Yu
Lei Zhang
3DH
HAI
127
0
0
02 Dec 2024
Machine Learning Analysis of Anomalous Diffusion
Wenjie Cai
Yi Hu
X. Qu
Hui Zhao
Gongyi Wang
Jing Li
Zihan Huang
97
1
0
02 Dec 2024
DiffPatch: Generating Customizable Adversarial Patches using Diffusion Models
Zhixiang Wang
Guangnan Ye
Xinyu Wang
Siheng Chen
Ziyi Wang
Xingjun Ma
Yu-Gang Jiang
AAML
DiffM
132
0
0
02 Dec 2024
GFreeDet: Exploiting Gaussian Splatting and Foundation Models for Model-free Unseen Object Detection in the BOP Challenge 2024
Xingyu Liu
Yingyue Li
Chengxi Li
Gu Wang
Chenyangguang Zhang
Ziqin Huang
Xiangyang Ji
3DGS
109
2
0
02 Dec 2024
Explaining the Impact of Training on Vision Models via Activation Clustering
Ahcène Boubekki
Samuel G. Fadel
Sebastian Mair
169
0
0
29 Nov 2024
Feedback-driven object detection and iterative model improvement
Sönke Tenckhoff
Mario Koddenbrock
Erik Rodner
ObjD
VLM
124
0
0
29 Nov 2024
ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model
Kunyang Han
Yibo Hu
Mengxue Qu
Hailin Shi
Yao Zhao
Y. X. Wei
MLLM
VLM
3DV
123
1
0
29 Nov 2024
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers
Chancharik Mitra
Brandon Huang
Tianning Chai
Zhiqiu Lin
Assaf Arbelle
Rogerio Feris
Leonid Karlinsky
Trevor Darrell
Deva Ramanan
Roei Herzig
VLM
236
4
0
28 Nov 2024
Perception of Visual Content: Differences Between Humans and Foundation Models
Nardiena A. Pratama
Shaoyang Fan
Gianluca Demartini
VLM
119
0
0
28 Nov 2024
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou
Symeon Papadopoulos
I. Kompatsiaris
Efstratios Gavves
115
0
0
28 Nov 2024
Improving Accuracy and Generalization for Efficient Visual Tracking
Ram J. Zaveri
Shivang Patel
Yu Gu
Gianfranco Doretto
VLM
107
0
0
28 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
134
8
0
27 Nov 2024
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li
Zhengkang Xiang
Joseph West
Kourosh Khoshelham
ObjD
VLM
143
1
0
27 Nov 2024
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Zhongyu Xia
Jishuo Li
Zhiwei Lin
Xinhao Wang
Yansen Wang
Ming-Hsuan Yang
VLM
106
2
0
26 Nov 2024
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation
M. Valiuddin
R. V. Sloun
C.G.A. Viviers
Peter H. N. de With
Fons van der Sommen
UQCV
172
1
0
25 Nov 2024
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen
Siyuan Liang
Jingzhi Li
Shiming Liu
Maosen Li
Zheng Huang
Qichuan Geng
Xiaochun Cao
FAtt
135
4
0
25 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
125
12
0
25 Nov 2024
DiffDesign: Controllable Diffusion with Meta Prior for Efficient Interior Design Generation
Yuxuan Yang
Wenwen Qiang
Jingyao Wang
Jingyao Wang
Changwen Zheng
DiffM
105
0
0
25 Nov 2024
MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking
Chunhui Zhang
Li Liu
Hao Wen
Xi Zhou
Yijiao Wang
Mamba
125
2
0
24 Nov 2024
VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
Jiaqi Wang
Yifei Gao
Jitao Sang
MLLM
150
2
0
24 Nov 2024
Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
Ji Hyeok Jung
Eun Tae Kim
S. Kim
Joo Ho Lee
Bumsoo Kim
Buru Chang
VLM
402
1
0
24 Nov 2024
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu
Wei Chow
Zhongqi Yue
Kaihang Pan
Yang Wu
Xiaoyang Wan
Juncheng Billy Li
Siliang Tang
Hao Zhang
Yueting Zhuang
DiffM
134
19
0
24 Nov 2024
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang
Junkai Chen
Beier Zhu
Tingjin Luo
Yankun Shen
Xu Yang
118
5
0
23 Nov 2024
MUNBa: Machine Unlearning via Nash Bargaining
Jing Wu
Mehrtash Harandi
MU
96
4
0
23 Nov 2024
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
149
3
0
22 Nov 2024
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models
Jordan Vice
Naveed Akhtar
Leonid Sigal
Richard Hartley
Ajmal Mian
EGVM
97
0
0
21 Nov 2024
AI-generated Image Detection: Passive or Watermark?
Moyang Guo
Yuepeng Hu
Zhengyuan Jiang
Zeyu Li
Amir Sadovnik
Arka Daw
Neil Zhenqiang Gong
139
1
0
20 Nov 2024
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh
Nimrod Shabtay
Wei Lin
Eli Schwartz
Hilde Kuehne
...
Leonid Karlinsky
James Glass
Assaf Arbelle
S. Ullman
Muhammad Jehanzeb Mirza
VLM
134
1
0
20 Nov 2024
Previous
1
2
3
...
5
6
7
...
12
13
14
Next