Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1505.04870
Cited By
v1
v2
v3
v4 (latest)
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"
50 / 1,325 papers shown
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
182
7
0
06 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
Uddeshya Upadhyay
Shyamgopal Karthik
Goran Frehse
Zeynep Akata
MLLM
VLM
471
6
0
01 Jul 2023
Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yasmine Karoui
R. Lebret
Negar Foroutan
Karl Aberer
MLLM
VLM
119
4
0
29 Jun 2023
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu
Yi Zhu
Tiffany Deng
Abhay Mittal
Yanbei Chen
Manchen Wang
Paolo Favaro
Joseph Tighe
Davide Modolo
VLM
CoGe
355
14
0
28 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \
10,000 Budget; An Extra \
4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIP
VLM
283
25
0
27 Jun 2023
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Qiong Wu
Shubin Huang
Weihao Ye
Pingyang Dai
Annan Shu
Guannan Jiang
Rongrong Ji
VLM
VPVLM
127
2
0
27 Jun 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
464
817
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
International Conference on Learning Representations (ICLR), 2023
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
404
1,039
0
26 Jun 2023
Localized Text-to-Image Generation for Free via Cross Attention Control
Yutong He
Ruslan Salakhutdinov
J. Zico Kolter
DiffM
167
28
0
26 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
210
10
0
25 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
European Conference on Computer Vision (ECCV), 2023
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
103
6
0
25 Jun 2023
DesCo: Learning Object Recognition with Rich Language Descriptions
Neural Information Processing Systems (NeurIPS), 2023
Liunian Harold Li
Zi-Yi Dou
Nanyun Peng
Kai-Wei Chang
ObjD
VLM
189
29
0
24 Jun 2023
A Survey on Multimodal Large Language Models
National Science Review (NSR), 2023
Xinglong Mao
Chaoyou Fu
Zhengye Zhang
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
463
1,022
0
23 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
International Conference on Learning Representations (ICLR), 2023
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLM
CLIP
214
11
0
15 Jun 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziqiao Ma
Jiayi Pan
J. Chai
ObjD
VLM
215
12
0
14 Jun 2023
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
Gregor Geigle
Radu Timofte
Goran Glavaš
VLM
MLLM
163
6
0
14 Jun 2023
GeneCIS: A Benchmark for General Conditional Image Similarity
Computer Vision and Pattern Recognition (CVPR), 2023
S. Vaze
Nicolas Carion
Ishan Misra
VLM
DiffM
249
43
0
13 Jun 2023
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models
Raz Lapid
Moshe Sipper
AAML
233
24
0
13 Jun 2023
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Chen Cai
Suchen Wang
Kim-Hui Yap
Yi Wang
ObjD
235
5
0
13 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
International Conference on Learning Representations (ICLR), 2023
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
296
39
0
12 Jun 2023
Global and Local Semantic Completion Learning for Vision-Language Pre-training
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
253
8
0
12 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Chen Ma
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
114
14
0
12 Jun 2023
Read, look and detect: Bounding box annotation from image-caption pairs
E. Sanchez
ObjD
165
2
0
09 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
IEEE Access (IEEE Access), 2023
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
337
42
0
09 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sandro Pezzelle
169
11
0
08 Jun 2023
ScaleDet: A Scalable Multi-Dataset Object Detector
Computer Vision and Pattern Recognition (CVPR), 2023
Yanbei Chen
Manchen Wang
Abhay Mittal
Zhenlin Xu
Paolo Favaro
Joseph Tighe
Davide Modolo
ObjD
177
27
0
08 Jun 2023
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Interspeech (Interspeech), 2023
Claytone Sikasote
Kalinda Siaminwe
Stanly Mwape
Bangiwe Zulu
Mofya Phiri
Martin Phiri
David Zulu
Mayumbo Nyirenda
Antonios Anastasopoulos
264
10
0
07 Jun 2023
Referring Expression Comprehension Using Language Adaptive Inference
AAAI Conference on Artificial Intelligence (AAAI), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Yongjian Fu
Xi Li
ObjD
254
31
0
06 Jun 2023
GRES: Generalized Referring Expression Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Chang Liu
Henghui Ding
Xudong Jiang
337
247
0
01 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
IEEE International Joint Conference on Neural Network (IJCNN), 2023
Shubin Huang
Qiong Wu
Weihao Ye
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLM
VPVLM
LRM
128
2
0
01 Jun 2023
Too Large; Data Reduction for Vision-Language Pre-Training
IEEE International Conference on Computer Vision (ICCV), 2023
Alex Jinpeng Wang
Kevin Qinghong Lin
David Junhao Zhang
Stan Weixian Lei
Mike Zheng Shou
VLM
335
31
0
31 May 2023
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
Neural Information Processing Systems (NeurIPS), 2023
Sivan Doveh
Assaf Arbelle
Sivan Harary
Roei Herzig
Donghyun Kim
...
Yikang Shen
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
387
73
0
31 May 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
British Machine Vision Conference (BMVC), 2023
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
261
9
0
30 May 2023
Learning without Forgetting for Vision-Language Models
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
387
78
0
30 May 2023
Controllable Text-to-Image Generation with GPT-4
Tianjun Zhang
Yi Zhang
Vibhav Vineet
Neel Joshi
Xin Eric Wang
DiffM
347
61
0
29 May 2023
Contextual Object Detection with Multimodal Large Language Models
International Journal of Computer Vision (IJCV), 2023
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
328
142
0
29 May 2023
TaleCrafter: Interactive Story Visualization with Multiple Characters
ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2023
Yuan Gong
Youxin Pang
Xiaodong Cun
Menghan Xia
Yingqing He
...
Longyue Wang
Yong Zhang
Xintao Wang
Ying Shan
Yujiu Yang
DiffM
351
65
0
29 May 2023
Improved Probabilistic Image-Text Representations
International Conference on Learning Representations (ICLR), 2023
Sanghyuk Chun
VLM
604
43
0
29 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Neural Information Processing Systems (NeurIPS), 2023
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
515
174
0
29 May 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
International Conference on Learning Representations (ICLR), 2023
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
VLM
322
39
0
29 May 2023
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jiapeng Wang
Chengyu Wang
Xiaodan Wang
Jun Huang
Lianwen Jin
VLM
252
9
0
28 May 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
295
10
0
28 May 2023
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingqing Cao
Bhargavi Paranjape
Hannaneh Hajishirzi
MLLM
VLM
173
51
0
27 May 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
176
8
0
26 May 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Neural Information Processing Systems (NeurIPS), 2023
Jannik Kossen
Mark Collier
Basil Mustafa
Tianlin Li
Xiaohua Zhai
Lucas Beyer
Andreas Steiner
Jesse Berent
Rodolphe Jenatton
Efi Kokiopoulou
VLM
216
18
0
26 May 2023
Learning to Imagine: Visually-Augmented Natural Language Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Tianyi Tang
Yushuo Chen
Yifan Du
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
DiffM
427
10
0
26 May 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
330
69
0
25 May 2023
Weakly Supervised Vision-and-Language Pre-training with Relative Representations
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Chi Chen
Peng Li
Maosong Sun
Yang Liu
152
2
0
24 May 2023
Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
MLLM
390
55
0
24 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
P. Sadler
David Schlangen
132
3
0
24 May 2023
Previous
1
2
3
...
13
14
15
...
25
26
27
Next
Page 14 of 27
Page
of 27
Go