ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,325 papers shown
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
Zhengpeng Shi
Hengli Li
Yanpeng Zhao
Jianqun Zhou
Yuxuan Wang
Qinrong Cui
Wei Bi
Songchun Zhu
Bo Zhao
Zilong Zheng
VLM
122
0
0
30 Sep 2025
MuSLR: Multimodal Symbolic Logical Reasoning
MuSLR: Multimodal Symbolic Logical Reasoning
Jundong Xu
Hao Fei
Yuhui Zhang
Liangming Pan
Qijun Huang
...
Preslav Nakov
Min-Yen Kan
William Y. Wang
Mong-Li Lee
Wynne Hsu
ReLMLRM
130
0
0
30 Sep 2025
Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding
Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding
Haotian Xue
Yunhao Ge
Y. Zeng
Zhaoshuo Li
Ming-Yu Liu
Yongxin Chen
JiaoJiao Fan
141
1
0
30 Sep 2025
OIG-Bench: A Multi-Agent Annotated Benchmark for Multimodal One-Image Guides Understanding
OIG-Bench: A Multi-Agent Annotated Benchmark for Multimodal One-Image Guides Understanding
Jiancong Xie
Wenjin Wang
Zhuomeng Zhang
Zihan Liu
Qi Liu
Ke Feng
Zixun Sun
Yuedong Yang
VLM
90
0
0
29 Sep 2025
ColLab: A Collaborative Spatial Progressive Data Engine for Referring Expression Comprehension and Generation
ColLab: A Collaborative Spatial Progressive Data Engine for Referring Expression Comprehension and Generation
Shilan Zhang
J. Huang
Ruilin Yao
Cong Wang
Yaxiong Chen
Peng Xu
Shengwu Xiong
125
0
0
28 Sep 2025
Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning
Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning
Zilun Zhang
Zian Guan
T. Zhao
H. Shen
Tianyu Li
Yuxiang Cai
Zhonggen Su
Zhaojun Liu
Jianwei Yin
Xiang Li
ObjDLRM
243
4
0
26 Sep 2025
Deepfakes: we need to re-think the concept of "real" images
Deepfakes: we need to re-think the concept of "real" images
J. Keuper
Margret Keuper
139
0
0
26 Sep 2025
Long Story Short: Disentangling Compositionality and Long-Caption Understanding in VLMs
Long Story Short: Disentangling Compositionality and Long-Caption Understanding in VLMs
Israfel Salazar
Desmond Elliott
Yova Kementchedjhieva
CoGeVLM
230
0
0
23 Sep 2025
OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment
OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment
Teng Xiao
Zuchao Li
Lefei Zhang
187
1
0
23 Sep 2025
Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
Wenhuan Lu
Xinyue Song
Wenjun Ke
Zhizhi Yu
Wenhao Yang
Jianguo Wei
ObjD
96
0
0
20 Sep 2025
RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning
RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning
Xiaosheng Long
Hanyu Wang
Zhentao Song
Kun Luo
Hongde Liu
136
0
0
19 Sep 2025
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation
Yu Chang
Jiahao Chen
Anzhe Cheng
Paul Bogdan
DiffM
128
0
0
18 Sep 2025
Efficient Multimodal Dataset Distillation via Generative Models
Efficient Multimodal Dataset Distillation via Generative Models
Zhenghao Zhao
Haoxuan Wang
Junyi Wu
Yuzhang Shang
Gaowen Liu
Yan Yan
DD
287
2
0
18 Sep 2025
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Peng Xu
Shengwu Xiong
Jiajun Zhang
Yaxiong Chen
Bowen Zhou
...
Yang Yang
Yanglin Deng
Yashu Kang
Ye Yuan
Y. Wen
LRM
127
1
0
17 Sep 2025
Evaluating Robustness of Vision-Language Models Under Noisy Conditions
Evaluating Robustness of Vision-Language Models Under Noisy Conditions
Purushoth
Alireza
AAML
97
0
0
15 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
324
3
0
12 Sep 2025
Recurrence Meets Transformers for Universal Multimodal Retrieval
Recurrence Meets Transformers for Universal Multimodal Retrieval
Davide Caffagni
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
188
1
0
10 Sep 2025
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Jiangnan Xie
Xiaolong Zheng
Liang Zheng
ObjD
174
0
0
08 Sep 2025
Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos
Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos
Davide Berghi
Philip J. B. Jackson
111
1
0
08 Sep 2025
Effectively obtaining acoustic, visual and textual data from videos
Effectively obtaining acoustic, visual and textual data from videos
Jorge E. León
Miguel Carrasco
VGen
139
1
0
06 Sep 2025
Semantic-guided LoRA Parameters Generation
Semantic-guided LoRA Parameters Generation
Miaoge Li
Yang Chen
Zhijie Rao
Can Jiang
Jingcai Guo
OffRL
116
0
0
05 Sep 2025
Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Reina Ishikawa
Ryo Fujii
Hideo Saito
Ryo Hachiuma
146
0
0
03 Sep 2025
EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic Captions
EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic Captions
Dinh-Khoi Vo
Van-Loc Nguyen
M. Tran
T. Le
3DVVGen
66
0
0
31 Aug 2025
VoCap: Video Object Captioning and Segmentation from Any Prompt
VoCap: Video Object Captioning and Segmentation from Any Prompt
J. Uijlings
Xingyi Zhou
Xiuye Gu
Arsha Nagrani
Anurag Arnab
Alireza Fathi
David A. Ross
Cordelia Schmid
VOSVLM
261
1
0
29 Aug 2025
Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval
Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval
Jonghyun Song
Youngjune Lee
Gyu-Hwung Cho
Ilhyeon Song
Saehun Kim
Yohan Jo
VLM
88
0
0
22 Aug 2025
RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution
RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution
Haodong He
Y. Bai
Rui Lan
Xu Duan
Lei Sun
Xiangxiang Chu
Gui-Song Xia
DiffM
126
1
0
22 Aug 2025
Towards Open World Detection: A Survey
Towards Open World Detection: A Survey
Andrei-Stefan Bulzan
Cosmin Cernazanu-Glavan
ObjDVLM
220
0
0
22 Aug 2025
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Shanlin Sun
Yifan Wang
Hanwen Zhang
Yifeng Xiong
Qin Ren
Ruogu Fang
Xiaohui Xie
Chenyu You
174
4
0
20 Aug 2025
Understanding Data Influence with Differential Approximation
Understanding Data Influence with Differential Approximation
Haoru Tan
Sitong Wu
Xiuzhe Wu
Wang Wang
Bo Zhao
Zeke Xie
Gui-Song Xia
Xiaojuan Qi
TDI
283
1
0
20 Aug 2025
7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models
7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models
Elena Izzo
Luca Parolari
Davide Vezzaro
Lamberto Ballan
111
0
0
18 Aug 2025
Region-Level Context-Aware Multimodal Understanding
Region-Level Context-Aware Multimodal Understanding
Hongliang Wei
Xianqi Zhang
Xingtao Wang
Xiaopeng Fan
Debin Zhao
VLM
165
0
0
17 Aug 2025
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models
Yuchen Zhou
Jiayu Tang
Shuo Yang
Xiaoyan Xiao
Yuqin Dai
Wenhao Yang
Chao Gou
Xiaobo Xia
Tat-Seng Chua
VLMCoGeLRM
145
2
0
15 Aug 2025
JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics
JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics
Simindokht Jahangard
Mehrzad Mohammadi
Yi Shen
Zhixi Cai
Hamid Rezatofighi
294
2
0
14 Aug 2025
Bridging Modality Gaps in e-Commerce Products via Vision-Language Alignment
Bridging Modality Gaps in e-Commerce Products via Vision-Language Alignment
Yipeng Zhang
Hongju Yu
Aritra Mandal
Canran Xu
Qunzhi Zhou
Zhe Wu
192
0
0
13 Aug 2025
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu
Zhibo Yang
Yuliang Liu
Xiang Bai
MLLMOffRLLRM
95
4
0
12 Aug 2025
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Weitai Kang
Weiming Zhuang
Zhizhong Li
Yan Yan
Lingjuan Lyu
126
1
0
11 Aug 2025
MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark
MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark
Haiyang Guo
Fei Zhu
Hongbo Zhao
Fanhu Zeng
Wenzhuo Liu
Shijie Ma
Da-Han Wang
Xu-Yao Zhang
CLL
214
2
0
10 Aug 2025
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
Zhangquan Chen
Ruihui Zhao
Chuwei Luo
Mingze Sun
Xinlei Yu
Yangyang Kang
Ruqi Huang
LRM
287
4
0
08 Aug 2025
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Hao Dong
Lijun Sheng
Jian Liang
Ran He
Eleni Chatzi
Olga Fink
OffRLVLM
219
4
0
07 Aug 2025
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
Y. Wang
Tao Wang
Chenwei Tang
Caiyang Yu
Zhengqing Zang
Mengmi Zhang
Shudong Huang
Jiancheng Lv
VLM
115
0
0
06 Aug 2025
ChartCap: Mitigating Hallucination of Dense Chart Captioning
ChartCap: Mitigating Hallucination of Dense Chart Captioning
Junyoung Lim
Jaewoo Ahn
Gunhee Kim
128
2
0
05 Aug 2025
VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction Editing Data and Long Captions
VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction Editing Data and Long Captions
Ziteng Wang
Siqi Yang
Limeng Qiao
Lin Ma
VLM
397
0
0
04 Aug 2025
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment
Dahun Kim
A. Angelova
VLM
232
1
0
03 Aug 2025
Eigen Neural Network: Unlocking Generalizable Vision with Eigenbasis
Eigen Neural Network: Unlocking Generalizable Vision with Eigenbasis
Anzhe Cheng
Chenzhong Yin
Mingxi Cheng
Shukai Duan
Shahin Nazarian
Paul Bogdan
225
0
0
02 Aug 2025
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Hyundong Jin
Hyung Jin Chang
Eunwoo Kim
VLM
142
0
0
01 Aug 2025
SPRINT: Scalable and Predictive Intent Refinement for LLM-Enhanced Session-based Recommendation
SPRINT: Scalable and Predictive Intent Refinement for LLM-Enhanced Session-based Recommendation
G. G. Lee
Y. Liu
Yifan Liu
Susik Yoon
Dong Wang
SeongKu Kang
Dong Wang
SeongKu Kang
191
2
0
01 Aug 2025
Improving Multimodal Contrastive Learning of Sentence Embeddings with Object-Phrase Alignment
Improving Multimodal Contrastive Learning of Sentence Embeddings with Object-Phrase Alignment
Kaiyan Zhao
Zhongtao Miao
Yoshimasa Tsuruoka
102
1
0
01 Aug 2025
Multimodal Referring Segmentation: A Survey
Multimodal Referring Segmentation: A Survey
Henghui Ding
Song Tang
Shuting He
Chang-rui Liu
Zuxuan Wu
Yu-Gang Jiang
394
11
0
01 Aug 2025
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko
Ji Soo Lee
M. Choi
Zihang Meng
Hyunwoo J. Kim
384
1
0
31 Jul 2025
On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations
On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations
Jordan Vice
Naveed Akhtar
Yansong Gao
Richard Hartley
Ajmal Mian
AAML
210
2
0
30 Jul 2025
Previous
12345...252627
Next
Page 2 of 27
Pageof 27