ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,325 papers shown
Contrastive Learning of Sentence Embeddings from Scratch
Contrastive Learning of Sentence Embeddings from ScratchConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Junlei Zhang
Zhenzhong Lan
Junxian He
SSL
376
35
0
24 May 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient
  Vision-Language Models
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
236
8
0
24 May 2023
Meta-learning For Vision-and-language Cross-lingual Transfer
Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu
Frank Keller
VLM
156
4
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image
  Regions
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjDVLM
237
9
0
24 May 2023
Mitigating Test-Time Bias for Fair Image Retrieval
Mitigating Test-Time Bias for Fair Image RetrievalNeural Information Processing Systems (NeurIPS), 2023
Fanjie Kong
Shuai Yuan
Weituo Hao
Ricardo Henao
199
24
0
23 May 2023
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain
  Readability Assessment
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability AssessmentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Tarek Naous
Michael Joseph Ryan
Anton Lavrouk
Mohit Chandra
Wei Xu
312
18
0
23 May 2023
VisorGPT: Learning Visual Prior via Generative Pre-Training
VisorGPT: Learning Visual Prior via Generative Pre-TrainingNeural Information Processing Systems (NeurIPS), 2023
Jinheng Xie
Kai Ye
Yudong Li
Yuexiang Li
Kevin Qinghong Lin
Yefeng Zheng
Linlin Shen
Mike Zheng Shou
ViT
782
9
0
23 May 2023
UNIMO-3: Multi-granularity Interaction for Vision-Language
  Representation Learning
UNIMO-3: Multi-granularity Interaction for Vision-Language Representation Learning
Hao Yang
Can Gao
Hao Liu
Xinyan Xiao
Yanyan Zhao
Bing Qin
114
3
0
23 May 2023
RaSa: Relation and Sensitivity Aware Representation Learning for
  Text-based Person Search
RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person SearchInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Yang Bai
Ming-Ming Cao
Daming Gao
Ziqiang Cao
Cheng Chen
Zhenfeng Fan
Liqiang Nie
Min Zhang
AI4TS
262
105
0
23 May 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
EDIS: Entity-Driven Image Search over Multimodal Web ContentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Siqi Liu
Weixi Feng
Tsu-Jui Fu
Wenhu Chen
Wenjie Wang
VLM
337
21
0
23 May 2023
Type-to-Track: Retrieve Any Object via Prompt-based Tracking
Type-to-Track: Retrieve Any Object via Prompt-based TrackingNeural Information Processing Systems (NeurIPS), 2023
Pha Nguyen
Kha Gia Quach
Kris Kitani
Khoa Luu
285
32
0
22 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
205
11
0
20 May 2023
Not All Semantics are Created Equal: Contrastive Self-supervised
  Learning with Automatic Temperature Individualization
Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature IndividualizationInternational Conference on Machine Learning (ICML), 2023
Zimeng Qiu
Quanqi Hu
Zhuoning Yuan
Denny Zhou
Lijun Zhang
Tianbao Yang
257
27
0
19 May 2023
Going Denser with Open-Vocabulary Part Segmentation
Going Denser with Open-Vocabulary Part SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Pei Sun
Shoufa Chen
Chenchen Zhu
Fanyi Xiao
Ping Luo
Saining Xie
Zhicheng Yan
ObjDVLM
235
72
0
18 May 2023
Weakly-Supervised Visual-Textual Grounding with Semantic Prior
  Refinement
Weakly-Supervised Visual-Textual Grounding with Semantic Prior RefinementBritish Machine Vision Conference (BMVC), 2023
Davide Rigoni
Luca Parolari
Luciano Serafini
A. Sperduti
Lamberto Ballan
192
1
0
18 May 2023
Iterative Adversarial Attack on Image-guided Story Ending Generation
Iterative Adversarial Attack on Image-guided Story Ending GenerationIEEE transactions on multimedia (IEEE TMM), 2023
Youze Wang
Wenbo Hu
Richang Hong
248
8
0
16 May 2023
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual GroundingIEEE transactions on multimedia (IEEE TMM), 2023
Linhui Xiao
Xiaoshan Yang
Fang Peng
Ming Yan
Yaowei Wang
Changsheng Xu
ObjDVLM
472
62
0
15 May 2023
Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Parameter-efficient Tuning of Large-scale Multimodal Foundation ModelNeural Information Processing Systems (NeurIPS), 2023
Haixin Wang
Xinlong Yang
Jianlong Chang
Di Jin
Jinan Sun
Shikun Zhang
Xiao Luo
Qi Tian
301
38
0
15 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
421
131
0
14 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Measuring Progress in Fine-grained Vision-and-Language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
232
31
0
12 May 2023
Region-Aware Pretraining for Open-Vocabulary Object Detection with
  Vision Transformers
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjDViTVLM
425
112
0
11 May 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on
  Joint Textual and Visual Clues
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual CluesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yunxin Li
Baotian Hu
Xinyu Chen
Yuxin Ding
Lin Ma
Min Zhang
LRM
170
17
0
08 May 2023
TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch SelectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Chaoya Jiang
Wei Ye
Haiyang Xu
Miang yan
Shikun Zhang
Jie Zhang
Bin Bi
Songfang Huang
VLM
361
16
0
08 May 2023
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in
  Vietnamese
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese
Doanh C. Bui
Nghia Hieu Nguyen
Khang Phuoc-Quy Nguyen
VLM
242
4
0
07 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating
  Multi-Modalities as Foreign Languages
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
343
152
0
07 May 2023
LMEye: An Interactive Perception Network for Large Language Models
LMEye: An Interactive Perception Network for Large Language ModelsIEEE transactions on multimedia (IEEE TMM), 2023
Yunxin Li
Baotian Hu
Xinyu Chen
Lin Ma
Yong-mei Xu
Hao Fei
MLLMVLM
290
41
0
05 May 2023
ArK: Augmented Reality with Knowledge Interactive Emergent Ability
ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Qiuyuan Huang
Jinho Park
Abhinav Gupta
Paul N. Bennett
Ran Gong
...
Baolin Peng
O. Mohammed
C. Pal
Yejin Choi
Jianfeng Gao
197
8
0
01 May 2023
An Empirical Study of Multimodal Model Merging
An Empirical Study of Multimodal Model MergingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yi-Lin Sung
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Joey Tianyi Zhou
Lijuan Wang
MoMe
337
52
0
28 Apr 2023
Energy-based Models are Zero-Shot Planners for Compositional Scene
  Rearrangement
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
N. Gkanatsios
Ayush Jain
Zhou Xian
Yunchu Zhang
C. Atkeson
Katerina Fragkiadaki
LM&Ro
425
43
0
27 Apr 2023
From Association to Generation: Text-only Captioning by Unsupervised
  Cross-modal Mapping
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Junyan Wang
Ming Yan
Yi Zhang
Jitao Sang
CLIPVLM
301
17
0
26 Apr 2023
OmniLabel: A Challenging Benchmark for Language-Based Object Detection
OmniLabel: A Challenging Benchmark for Language-Based Object DetectionIEEE International Conference on Computer Vision (ICCV), 2023
S. Schulter
G. VijayKumarB.
Yumin Suh
Konstantinos M. Dafnis
Zhixing Zhang
Shiyu Zhao
Dimitris N. Metaxas
ObjD
187
17
0
22 Apr 2023
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text
  Matching Models
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models
Seulki Park
Daeho Um
Hajung Yoon
Sanghyuk Chun
Sangdoo Yun
Hawook Jeong
402
5
0
21 Apr 2023
Chain of Thought Prompt Tuning in Vision Language Models
Chain of Thought Prompt Tuning in Vision Language Models
Jiaxin Ge
Hongyin Luo
Siyuan Qian
Yulu Gan
Jie Fu
Shanghang Zhang
VLMLRMMLLM
290
33
0
16 Apr 2023
RECLIP: Resource-efficient CLIP by Training with Small Images
RECLIP: Resource-efficient CLIP by Training with Small Images
Runze Li
Dahun Kim
B. Bhanu
Weicheng Kuo
VLMCLIP
281
17
0
12 Apr 2023
MoMo: A shared encoder Model for text, image and multi-Modal
  representations
MoMo: A shared encoder Model for text, image and multi-Modal representations
Rakesh Chada
Zhao-Heng Zheng
P. Natarajan
ViT
126
5
0
11 Apr 2023
Embodied Concept Learner: Self-supervised Learning of Concepts and
  Mapping through Instruction Following
Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction FollowingConference on Robot Learning (CoRL), 2023
Mingyu Ding
Yan Xu
Zhenfang Chen
David D. Cox
Ping Luo
J. Tenenbaum
Chuang Gan
LM&Ro
207
27
0
07 Apr 2023
DATE: Domain Adaptive Product Seeker for E-commerce
DATE: Domain Adaptive Product Seeker for E-commerceComputer Vision and Pattern Recognition (CVPR), 2023
Haoyuan Li
Haojie Jiang
Tao Jin
Meng-Juan Li
Yan Chen
Zhijie Lin
Yang Zhao
Zhou Zhao
316
6
0
07 Apr 2023
Training-Free Layout Control with Cross-Attention Guidance
Training-Free Layout Control with Cross-Attention GuidanceIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Minghao Chen
Iro Laina
Andrea Vedaldi
DiffM
449
318
0
06 Apr 2023
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Uncurated Image-Text Datasets: Shedding Light on Demographic BiasComputer Vision and Pattern Recognition (CVPR), 2023
Noa Garcia
Yusuke Hirota
Yankun Wu
Yuta Nakashima
EGVM
202
71
0
06 Apr 2023
Multi-Modal Representation Learning with Text-Driven Soft Masks
Multi-Modal Representation Learning with Text-Driven Soft MasksComputer Vision and Pattern Recognition (CVPR), 2023
Jaeyoo Park
Bohyung Han
SSL
185
9
0
03 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
346
94
0
31 Mar 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLMVLM
385
31
0
29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
536
238
0
28 Mar 2023
Equivariant Similarity for Vision-Language Foundation Models
Equivariant Similarity for Vision-Language Foundation ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
282
63
0
25 Mar 2023
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
CoBIT: A Contrastive Bi-directional Image-Text Generation ModelInternational Conference on Learning Representations (ICLR), 2023
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
213
14
0
23 Mar 2023
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
  Understanding
ScanERU: Interactive 3D Visual Grounding based on Embodied Reference UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2023
Ziyang Lu
Yunqiang Pei
Guoqing Wang
Yang Yang
Zheng Wang
Heng Tao Shen
174
12
0
23 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Top-Down Visual Attention from Analysis by SynthesisComputer Vision and Pattern Recognition (CVPR), 2023
Baifeng Shi
Trevor Darrell
Xin Eric Wang
227
40
0
23 Mar 2023
VMCML: Video and Music Matching via Cross-Modality Lifting
VMCML: Video and Music Matching via Cross-Modality Lifting
Yi-Shan Lee
Wei-Cheng Tseng
Fu-En Wang
Min Sun
164
0
0
22 Mar 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
MAGVLT: Masked Generative Vision-and-Language TransformerComputer Vision and Pattern Recognition (CVPR), 2023
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
140
16
0
21 Mar 2023
Contrastive Alignment of Vision to Language Through Parameter-Efficient
  Transfer Learning
Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer LearningInternational Conference on Learning Representations (ICLR), 2023
Zaid Khan
Yun Fu
VLM
182
20
0
21 Mar 2023
Previous
123...141516...252627
Next
Page 15 of 27
Pageof 27