ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,325 papers shown
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
182
7
0
06 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
Uddeshya Upadhyay
Shyamgopal Karthik
Goran Frehse
Zeynep Akata
MLLMVLM
471
6
0
01 Jul 2023
Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
Stop Pre-Training: Adapt Visual-Language Models to Unseen LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yasmine Karoui
R. Lebret
Negar Foroutan
Karl Aberer
MLLMVLM
119
4
0
29 Jun 2023
Benchmarking Zero-Shot Recognition with Vision-Language Models:
  Challenges on Granularity and Specificity
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu
Yi Zhu
Tiffany Deng
Abhay Mittal
Yanbei Chen
Manchen Wang
Paolo Favaro
Joseph Tighe
Davide Modolo
VLMCoGe
355
14
0
28 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy
  within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \10,000 Budget; An Extra \4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIPVLM
283
25
0
27 Jun 2023
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Approximated Prompt Tuning for Vision-Language Pre-trained Models
Qiong Wu
Shubin Huang
Weihao Ye
Pingyang Dai
Annan Shu
Guannan Jiang
Rongrong Ji
VLMVPVLM
127
2
0
27 Jun 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
464
817
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the WorldInternational Conference on Learning Representations (ICLR), 2023
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
404
1,039
0
26 Jun 2023
Localized Text-to-Image Generation for Free via Cross Attention Control
Localized Text-to-Image Generation for Free via Cross Attention Control
Yutong He
Ruslan Salakhutdinov
J. Zico Kolter
DiffM
167
28
0
26 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive
  Rewards
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards
Yangjun Mao
Jun Xiao
Dong Zhang
Meng Cao
Jian Shao
Yueting Zhuang
Long Chen
EGVM
210
10
0
25 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching
  Attention and Input
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and InputEuropean Conference on Computer Vision (ECCV), 2023
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
103
6
0
25 Jun 2023
DesCo: Learning Object Recognition with Rich Language Descriptions
DesCo: Learning Object Recognition with Rich Language DescriptionsNeural Information Processing Systems (NeurIPS), 2023
Liunian Harold Li
Zi-Yi Dou
Nanyun Peng
Kai-Wei Chang
ObjDVLM
189
29
0
24 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language ModelsNational Science Review (NSR), 2023
Xinglong Mao
Chaoyou Fu
Zhengye Zhang
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLMLRM
463
1,022
0
23 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
COSA: Concatenated Sample Pretrained Vision-Language Foundation ModelInternational Conference on Learning Representations (ICLR), 2023
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLMCLIP
214
11
0
15 Jun 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast
  Mapping in Vision-Language Models
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziqiao Ma
Jiayi Pan
J. Chai
ObjDVLM
215
12
0
14 Jun 2023
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language
  Representations
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
Gregor Geigle
Radu Timofte
Goran Glavaš
VLMMLLM
163
6
0
14 Jun 2023
GeneCIS: A Benchmark for General Conditional Image Similarity
GeneCIS: A Benchmark for General Conditional Image SimilarityComputer Vision and Pattern Recognition (CVPR), 2023
S. Vaze
Nicolas Carion
Ishan Misra
VLMDiffM
249
43
0
13 Jun 2023
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models
I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models
Raz Lapid
Moshe Sipper
AAML
233
24
0
13 Jun 2023
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Top-Down Framework for Weakly-supervised Grounded Image Captioning
Chen Cai
Suchen Wang
Kim-Hui Yap
Yi Wang
ObjD
235
5
0
13 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
Retrieval-Enhanced Contrastive Vision-Text ModelsInternational Conference on Learning Representations (ICLR), 2023
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIPVLM
296
39
0
12 Jun 2023
Global and Local Semantic Completion Learning for Vision-Language
  Pre-training
Global and Local Semantic Completion Learning for Vision-Language Pre-trainingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Rong-Cheng Tu
Yatai Ji
Jie Jiang
Weijie Kong
Chengfei Cai
Wenzhe Zhao
Hongfa Wang
Yujiu Yang
Wei Liu
VLM
253
8
0
12 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Chen Ma
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
114
14
0
12 Jun 2023
Read, look and detect: Bounding box annotation from image-caption pairs
Read, look and detect: Bounding box annotation from image-caption pairs
E. Sanchez
ObjD
165
2
0
09 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review
  of Methodological Advances and Future Research Directions
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research DirectionsIEEE Access (IEEE Access), 2023
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
337
42
0
09 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Dealing with Semantic Underspecification in Multimodal NLPAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Sandro Pezzelle
169
11
0
08 Jun 2023
ScaleDet: A Scalable Multi-Dataset Object Detector
ScaleDet: A Scalable Multi-Dataset Object DetectorComputer Vision and Pattern Recognition (CVPR), 2023
Yanbei Chen
Manchen Wang
Abhay Mittal
Zhenlin Xu
Paolo Favaro
Joseph Tighe
Davide Modolo
ObjD
177
27
0
08 Jun 2023
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Zambezi Voice: A Multilingual Speech Corpus for Zambian LanguagesInterspeech (Interspeech), 2023
Claytone Sikasote
Kalinda Siaminwe
Stanly Mwape
Bangiwe Zulu
Mofya Phiri
Martin Phiri
David Zulu
Mayumbo Nyirenda
Antonios Anastasopoulos
264
10
0
07 Jun 2023
Referring Expression Comprehension Using Language Adaptive Inference
Referring Expression Comprehension Using Language Adaptive InferenceAAAI Conference on Artificial Intelligence (AAAI), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Yongjian Fu
Xi Li
ObjD
254
31
0
06 Jun 2023
GRES: Generalized Referring Expression Segmentation
GRES: Generalized Referring Expression SegmentationComputer Vision and Pattern Recognition (CVPR), 2023
Chang Liu
Henghui Ding
Xudong Jiang
337
247
0
01 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual PromptingIEEE International Joint Conference on Neural Network (IJCNN), 2023
Shubin Huang
Qiong Wu
Weihao Ye
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLMVPVLMLRM
128
2
0
01 Jun 2023
Too Large; Data Reduction for Vision-Language Pre-Training
Too Large; Data Reduction for Vision-Language Pre-TrainingIEEE International Conference on Computer Vision (ICCV), 2023
Alex Jinpeng Wang
Kevin Qinghong Lin
David Junhao Zhang
Stan Weixian Lei
Mike Zheng Shou
VLM
335
31
0
31 May 2023
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL
  Models
Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL ModelsNeural Information Processing Systems (NeurIPS), 2023
Sivan Doveh
Assaf Arbelle
Sivan Harary
Roei Herzig
Donghyun Kim
...
Yikang Shen
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLMCoGe
387
73
0
31 May 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
DisCLIP: Open-Vocabulary Referring Expression GenerationBritish Machine Vision Conference (BMVC), 2023
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
261
9
0
30 May 2023
Learning without Forgetting for Vision-Language Models
Learning without Forgetting for Vision-Language ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLMCLL
387
78
0
30 May 2023
Controllable Text-to-Image Generation with GPT-4
Controllable Text-to-Image Generation with GPT-4
Tianjun Zhang
Yi Zhang
Vibhav Vineet
Neel Joshi
Xin Eric Wang
DiffM
347
61
0
29 May 2023
Contextual Object Detection with Multimodal Large Language Models
Contextual Object Detection with Multimodal Large Language ModelsInternational Journal of Computer Vision (IJCV), 2023
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjDVLMMLLM
328
142
0
29 May 2023
TaleCrafter: Interactive Story Visualization with Multiple Characters
TaleCrafter: Interactive Story Visualization with Multiple CharactersACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2023
Yuan Gong
Youxin Pang
Xiaodong Cun
Menghan Xia
Yingqing He
...
Longyue Wang
Yong Zhang
Xintao Wang
Ying Shan
Yujiu Yang
DiffM
351
65
0
29 May 2023
Improved Probabilistic Image-Text Representations
Improved Probabilistic Image-Text RepresentationsInternational Conference on Learning Representations (ICLR), 2023
Sanghyuk Chun
VLM
604
43
0
29 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and
  Dataset
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and DatasetNeural Information Processing Systems (NeurIPS), 2023
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
515
174
0
29 May 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in
  Vision-Language Models
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
VLM
322
39
0
29 May 2023
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge
  Interaction Graph for Lightweight Text-Image Retrieval
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jiapeng Wang
Chengyu Wang
Xiaodan Wang
Jun Huang
Lianwen Jin
VLM
252
9
0
28 May 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
295
10
0
28 May 2023
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
PuMer: Pruning and Merging Tokens for Efficient Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Qingqing Cao
Bhargavi Paranjape
Hannaneh Hajishirzi
MLLMVLM
173
51
0
27 May 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
BIG-C: a Multimodal Multi-Purpose Dataset for BembaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
176
8
0
26 May 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Three Towers: Flexible Contrastive Learning with Pretrained Image ModelsNeural Information Processing Systems (NeurIPS), 2023
Jannik Kossen
Mark Collier
Basil Mustafa
Tianlin Li
Xiaohua Zhai
Lucas Beyer
Andreas Steiner
Jesse Berent
Rodolphe Jenatton
Efi Kokiopoulou
VLM
216
18
0
26 May 2023
Learning to Imagine: Visually-Augmented Natural Language Generation
Learning to Imagine: Visually-Augmented Natural Language GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Tianyi Tang
Yushuo Chen
Yifan Du
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
DiffM
427
10
0
26 May 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language
  Catalyst
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
330
69
0
25 May 2023
Weakly Supervised Vision-and-Language Pre-training with Relative
  Representations
Weakly Supervised Vision-and-Language Pre-training with Relative RepresentationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Chi Chen
Peng Li
Maosong Sun
Yang Liu
152
2
0
24 May 2023
Visual Programming for Text-to-Image Generation and Evaluation
Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
MLLM
390
55
0
24 May 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental
  Algorithm for Referring Expression Generation from Examples
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from ExamplesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
P. Sadler
David Schlangen
132
3
0
24 May 2023
Previous
123...131415...252627
Next
Page 14 of 27
Pageof 27