Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1505.04870
Cited By
v1
v2
v3
v4 (latest)
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"
50 / 1,325 papers shown
LAVIS: A Library for Language-Vision Intelligence
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
Guosheng Lin
VLM
337
63
0
15 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Neural Information Processing Systems (NeurIPS), 2022
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
294
178
0
15 Sep 2022
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network
IET Computer Vision (ICV), 2022
Tiancheng Zhao
Peng Liu
Kyusong Lee
VLM
MLLM
ObjD
157
16
0
10 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
ACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
315
169
0
07 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLM
SSL
144
2
0
06 Sep 2022
Design of the topology for contrastive visual-textual alignment
Zhun Sun
376
2
0
05 Sep 2022
RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
Neural Information Processing Systems (NeurIPS), 2022
Hangjie Yuan
Jianwen Jiang
Samuel Albanie
Tao Feng
Ziyuan Huang
Dong Ni
Mingqian Tang
VLM
374
76
0
05 Sep 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
British Machine Vision Conference (BMVC), 2022
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
311
28
0
29 Aug 2022
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
430
6
0
24 Aug 2022
Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Journal of Imaging (JI), 2022
Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
VLM
139
1
0
23 Aug 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
640
707
0
22 Aug 2022
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
European Conference on Computer Vision (ECCV), 2022
Haoran Wang
Dongliang He
Wenhao Wu
Boyang Xia
Min Yang
Fu Li
YunLong Yu
Zhong Ji
Errui Ding
Jingdong Wang
199
27
0
21 Aug 2022
VLMAE: Vision-Language Masked Autoencoder
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Chen Wu
Xiujun Shu
Bohan Ren
VLM
205
11
0
19 Aug 2022
Multimodal foundation models are better simulators of the human brain
Haoyu Lu
Qiongyi Zhou
Nanyi Fei
Zhiwu Lu
Mingyu Ding
...
Changde Du
Xin Zhao
Haoran Sun
Huiguang He
J. Wen
AI4CE
183
19
0
17 Aug 2022
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training
European Conference on Computer Vision (ECCV), 2022
Jaeseok Byun
Taebaek Hwang
Jianlong Fu
Taesup Moon
VLM
212
13
0
08 Aug 2022
Fine-Grained Semantically Aligned Vision-Language Pre-Training
Neural Information Processing Systems (NeurIPS), 2022
Juncheng Li
Xin He
Longhui Wei
Long Qian
Linchao Zhu
Lingxi Xie
Yueting Zhuang
Qi Tian
Siliang Tang
VLM
209
100
0
04 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation Learning
International Conference on Learning Representations (ICLR), 2022
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
257
84
0
03 Aug 2022
Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics
International Conference on Pattern Recognition (ICPR), 2022
Xiaoyuan Guo
Jiali Duan
C.-C. Jay Kuo
J. Gichoya
Imon Banerjee
VLM
186
1
0
31 Jul 2022
Curriculum Learning for Data-Efficient Vision-Language Alignment
Tejas Srinivasan
Xiang Ren
Jesse Thomason
VLM
156
11
0
29 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
European Conference on Computer Vision (ECCV), 2022
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
225
56
0
26 Jul 2022
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
Neural Information Processing Systems (NeurIPS), 2022
Yonatan Bitton
Nitzan Bitton-Guetta
Ron Yosef
Yuval Elovici
Joey Tianyi Zhou
Gabriel Stanovsky
Roy Schwartz
219
19
0
25 Jul 2022
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations
ACM Multimedia (ACM MM), 2022
Qian Yang
Yunxin Li
Baotian Hu
Lin Ma
Yuxin Ding
Min Zhang
240
11
0
23 Jul 2022
Rethinking the Reference-based Distinctive Image Captioning
ACM Multimedia (ACM MM), 2022
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
228
23
0
22 Jul 2022
Don't Stop Learning: Towards Continual Learning for the CLIP Model
Yuxuan Ding
Lingqiao Liu
Chunna Tian
Jingyuan Yang
Haoxuan Ding
CLL
VLM
KELM
226
72
0
19 Jul 2022
Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Xuejing Liu
Liang Li
Shuhui Wang
Zhengjun Zha
Dechao Meng
Qi Tian
Qingming Huang
251
74
0
18 Jul 2022
FashionViL: Fashion-Focused Vision-and-Language Representation Learning
European Conference on Computer Vision (ECCV), 2022
Xiaoping Han
Licheng Yu
Xiatian Zhu
Li Zhang
Yi-Zhe Song
Tao Xiang
AI4TS
192
60
0
17 Jul 2022
LineCap: Line Charts for Data Visualization Captioning Models
Visual .. (VISUAL), 2022
Anita Mahinpei
Zona Kostic
Christy Tanner
VLM
196
25
0
15 Jul 2022
Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases
Zhihao Yuan
Xu Yan
Zhuo Li
Xuhao Li
Yao Guo
Shuguang Cui
Zhen Li
194
18
0
05 Jul 2022
Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval
Keyu Wen
Zhenshan Tan
Qingrong Cheng
Cheng Chen
X. Gu
VLM
198
1
0
02 Jul 2022
Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations
Computer Vision and Pattern Recognition (CVPR), 2022
Ziyan Yang
Kushal Kafle
Franck Dernoncourt
Vicente Ordónez Román
VLM
422
32
0
30 Jun 2022
Towards Adversarial Attack on Vision-Language Pre-training Models
ACM Multimedia (ACM MM), 2022
Jiaming Zhang
Qiaomin Yi
Jitao Sang
VLM
AAML
303
155
0
19 Jun 2022
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs
Neural Information Processing Systems (NeurIPS), 2022
Tal Shaharabany
Yoad Tewel
Lior Wolf
ObjD
255
23
0
19 Jun 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
International Conference on Machine Learning (ICML), 2022
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
209
54
0
17 Jun 2022
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
399
122
0
16 Jun 2022
RefCrowd: Grounding the Target in Crowd with Referring Expressions
ACM Multimedia (ACM MM), 2022
Heqian Qiu
Hongliang Li
Taijin Zhao
Lanxiao Wang
Qingbo Wu
Fanman Meng
ObjD
218
9
0
16 Jun 2022
Image Captioning based on Feature Refinement and Reflective Decoding
G. Alabduljabbar
Hafida Benhidour
Said Kerrache
3DV
157
3
0
16 Jun 2022
Multimodal Dialogue State Tracking
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Hung Le
Nancy F. Chen
Guosheng Lin
160
10
0
16 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Neural Information Processing Systems (NeurIPS), 2022
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
296
152
0
15 Jun 2022
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Jiajun Deng
Zhengyuan Yang
Daqing Liu
Tianlang Chen
Wen-gang Zhou
Yanyong Zhang
Houqiang Li
Wanli Ouyang
ViT
242
90
0
14 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjD
VLM
296
354
0
12 Jun 2022
A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Zhihao Fan
Zhongyu Wei
Jingjing Chen
Siyuan Wang
Zejun Li
Jiarong Xu
Xuanjing Huang
CLL
155
6
0
11 Jun 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Neural Information Processing Systems (NeurIPS), 2022
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Jiaming Song
Xiaogang Wang
Jifeng Dai
MoMe
MoE
310
84
0
09 Jun 2022
VL-BEiT: Generative Vision-Language Pretraining
Hangbo Bao
Wenhui Wang
Li Dong
Furu Wei
VLM
180
48
0
02 Jun 2022
VALHALLA: Visual Hallucination for Machine Translation
Computer Vision and Pattern Recognition (CVPR), 2022
Yi Li
Yikang Shen
Yoon Kim
Chun-Fu Chen
Rogerio Feris
David D. Cox
Nuno Vasconcelos
MLLM
458
51
0
31 May 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
308
14
0
30 May 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Neural Information Processing Systems (NeurIPS), 2022
Shashank Goel
Hritik Bansal
S. Bhatia
Ryan Rossi
Vishwa Vinay
Aditya Grover
CLIP
VLM
522
166
0
28 May 2022
HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval
Feilong Chen
Xiuyi Chen
Jiaxin Shi
Duzhen Zhang
Jianlong Chang
Qi Tian
VLM
CLIP
227
7
0
24 May 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLM
MLLM
281
270
0
24 May 2022
Charon: a FrameNet Annotation Tool for Multimodal Corpora
Law (LAW), 2022
Frederico Belcavello
Marcelo Viridiano
E. Matos
Haiyue Song
102
6
0
24 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
256
43
0
23 May 2022
Previous
1
2
3
...
17
18
19
...
25
26
27
Next
Page 18 of 27
Page
of 27
Go