Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1505.04870
Cited By
v1
v2
v3
v4 (latest)
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"
50 / 1,325 papers shown
Box-based Refinement for Weakly Supervised and Unsupervised Localization Tasks
IEEE International Conference on Computer Vision (ICCV), 2023
Eyal Gomel
Tal Shaharabany
Lior Wolf
ObjD
351
6
0
07 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
IEEE International Conference on Computer Vision (ICCV), 2023
Clarence Lee
M Ganesh Kumar
Cheston Tan
198
3
0
07 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
325
2
0
06 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
204
11
0
05 Sep 2023
MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zijun Long
George Killick
R. McCreadie
Gerardo Aragon Camarasa
247
2
0
04 Sep 2023
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Neural Information Processing Systems (NeurIPS), 2023
Qiong Wu
Wei Yu
Weihao Ye
Shubin Huang
Xiaoshuai Sun
Rongrong Ji
VLM
261
11
0
04 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
IEEE International Conference on Computer Vision (ICCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
339
38
0
02 Sep 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Weihan Wang
Zhiyong Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
289
9
0
31 Aug 2023
Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications
Wenyi Wu
Karim Bouyarmane
Ismail B. Tutar
63
2
0
30 Aug 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
438
21
0
28 Aug 2023
How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yi Yao
Peng Liu
Tiancheng Zhao
Qianqian Zhang
Jiajia Liao
Chunxin Fang
Kyusong Lee
Qing Wang
VLM
ObjD
201
17
0
25 Aug 2023
DLIP: Distilling Language-Image Pre-training
Huafeng Kuang
Jie Wu
Xiawu Zheng
Ming Li
Xuefeng Xiao
Rui Wang
Min Zheng
Rongrong Ji
VLM
150
6
0
24 Aug 2023
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Ziyan Yang
Kushal Kafle
Zhe Lin
Scott D. Cohen
Zhihong Ding
Vicente Ordonez
258
1
0
24 Aug 2023
Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation
IEEE International Conference on Computer Vision (ICCV), 2023
Yibo Cui
Liang Xie
Yakun Zhang
Meishan Zhang
Ye Yan
Erwei Yin
LM&Ro
224
28
0
24 Aug 2023
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Lai Wei
Zihao Jiang
Weiran Huang
Lichao Sun
VLM
MLLM
325
75
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
ACM Multimedia (ACM MM), 2023
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLM
CLIP
229
24
0
23 Aug 2023
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D
IEEE International Conference on Computer Vision (ICCV), 2023
Shuhei Kurita
Naoki Katsura
Eri Onami
EgoV
262
23
0
23 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
AAAI Conference on Artificial Intelligence (AAAI), 2023
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
205
20
0
23 Aug 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
IEEE International Conference on Computer Vision (ICCV), 2023
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIP
VLM
211
3
0
22 Aug 2023
ConcatPlexer: Additional Dim1 Batching for Faster ViTs
D. Han
Seunghyeon Seo
D. Jeon
Jiho Jang
Chaerin Kong
Nojun Kwak
ViT
MoE
193
0
0
22 Aug 2023
VQA Therapy: Exploring Answer Differences by Visually Grounding Answers
IEEE International Conference on Computer Vision (ICCV), 2023
Chongyan Chen
Samreen Anjum
Danna Gurari
247
16
0
21 Aug 2023
On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann
Matthias Hein
AAML
378
139
0
21 Aug 2023
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
AAAI Conference on Artificial Intelligence (AAAI), 2023
Fulong Ye
Guangyi Liu
Xinya Wu
Ledell Yu Wu
VLM
309
47
0
19 Aug 2023
Tackling Vision Language Tasks Through Learning Inner Monologues
AAAI Conference on Artificial Intelligence (AAAI), 2023
Diji Yang
Kezhen Chen
Jinmeng Rao
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Yujiao Shi
MLLM
234
14
0
19 Aug 2023
Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
Ye-Ting Chen
Siyu Zhang
Yaoru Sun
Weijian Liang
Haoran Wang
213
3
0
18 Aug 2023
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability
IEEE International Conference on Computer Vision (ICCV), 2023
Runhu Huang
Jianhua Han
Guansong Lu
Xiaodan Liang
Yihan Zeng
Wei Zhang
Hang Xu
DiffM
171
8
0
18 Aug 2023
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
655
6
0
18 Aug 2023
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
IEEE International Conference on Computer Vision (ICCV), 2023
Kaicheng Yang
Jiankang Deng
Xiang An
Jiawei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
VLM
CLIP
224
82
0
16 Aug 2023
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models
International Conference on Medical Imaging with Deep Learning (MIDL), 2023
K. Poudel
Manish Dhakal
Prasiddha Bhandari
Rabin Adhikari
Safal Thapaliya
Bishesh Khanal
VLM
564
32
0
15 Aug 2023
Vision-Language Dataset Distillation
Xindi Wu
Byron Zhang
Zhiwei Deng
Olga Russakovsky
DD
VLM
456
15
0
15 Aug 2023
Taming Self-Training for Open-Vocabulary Object Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Shiyu Zhao
S. Schulter
Long Zhao
Zhixing Zhang
Vijay Kumar B.G
Yumin Suh
Manmohan Chandraker
Dimitris N. Metaxas
VLM
ObjD
375
21
0
11 Aug 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
Neural Information Processing Systems (NeurIPS), 2023
Fanqing Meng
Wenqi Shao
Zhanglin Peng
Chong Jiang
Kaipeng Zhang
Yu Qiao
Ping Luo
175
21
0
11 Aug 2023
RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension
Qiang-feng Zhou
Chaohui Yu
Shaofeng Zhang
Sitong Wu
Zhibin Wang
Fan Wang
184
32
0
03 Aug 2023
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
...
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
MLLM
349
549
0
02 Aug 2023
VL-Grasp: a 6-Dof Interactive Grasp Policy for Language-Oriented Objects in Cluttered Indoor Scenes
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Yuhao Lu
Yixuan Fan
Beixing Deng
Fan Liu
Yali Li
Shengjin Wang
269
59
0
01 Aug 2023
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation
ACM Multimedia Asia (MA), 2023
Zhiyuan Li
Dongnan Liu
Heng Wang
Chaoyi Zhang
Weidong (Tom) Cai
RALM
193
2
0
27 Jul 2023
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
IEEE International Conference on Computer Vision (ICCV), 2023
Dong Lu
Zhiqiang Wang
Teng Wang
Weili Guan
Hongchang Gao
Feng Zheng
AAML
273
121
0
26 Jul 2023
3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
3DPC
194
29
0
25 Jul 2023
Described Object Detection: Liberating Object Detection with Flexible Expressions
Neural Information Processing Systems (NeurIPS), 2023
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
243
51
0
24 Jul 2023
Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision
Menghao Li
Chunlei Wang
W. Feng
Shuchang Lyu
Guangliang Cheng
Xiangtai Li
Binghao Liu
Qi Zhao
276
7
0
23 Jul 2023
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
Computer Vision and Pattern Recognition (CVPR), 2023
Zhihong Chen
Ruifei Zhang
Yibing Song
Xiang Wan
Guanbin Li
181
30
0
21 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning
IEEE transactions on multimedia (IEEE TMM), 2023
Zijie Song
Zhenzhen Hu
Yuanen Zhou
Ye Zhao
Richang Hong
Meng Wang
209
19
0
19 Jul 2023
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
IEEE International Conference on Computer Vision (ICCV), 2023
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
194
29
0
18 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Chaoyang Zhu
Long Chen
ObjD
VLM
511
72
0
18 Jul 2023
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
IEEE International Conference on Computer Vision (ICCV), 2023
Chaoya Jiang
Haiyang Xu
Wei Ye
Qinghao Ye
Chenliang Li
Mingshi Yan
Bin Bi
Shikun Zhang
Fei Huang
Songfang Huang
VLM
205
9
0
17 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Neural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
389
44
0
13 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLM
MLLM
229
42
0
13 Jul 2023
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Junghyun Kim
Gi-Cheon Kang
Suhyung Choi
Suyeon Shin
Byoung-Tak Zhang
LM&Ro
213
9
0
12 Jul 2023
Open-Vocabulary Object Detection via Scene Graph Discovery
ACM Multimedia (ACM MM), 2023
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
281
16
0
07 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLM
VLM
916
320
0
07 Jul 2023
Previous
1
2
3
...
12
13
14
...
25
26
27
Next
Page 13 of 27
Page
of 27
Go