ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,318 papers shown
Title
Deepfakes: we need to re-think the concept of "real" images
Deepfakes: we need to re-think the concept of "real" images
J. Keuper
Margret Keuper
90
0
0
26 Sep 2025
OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment
OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment
Teng Xiao
Zuchao Li
Lefei Zhang
121
0
0
23 Sep 2025
Long Story Short: Disentangling Compositionality and Long-Caption Understanding in VLMs
Long Story Short: Disentangling Compositionality and Long-Caption Understanding in VLMs
Israfel Salazar
Desmond Elliott
Yova Kementchedjhieva
CoGeVLM
151
0
0
23 Sep 2025
Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
Speech-to-See: End-to-End Speech-Driven Open-Set Object Detection
Wenhuan Lu
Xinyue Song
Wenjun Ke
Zhizhi Yu
Wenhao Yang
Jianguo Wei
ObjD
76
0
0
20 Sep 2025
RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning
RACap: Relation-Aware Prompting for Lightweight Retrieval-Augmented Image Captioning
Xiaosheng Long
Hanyu Wang
Zhentao Song
Kun Luo
Hongde Liu
84
0
0
19 Sep 2025
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation
Yu Chang
Jiahao Chen
Anzhe Cheng
Paul Bogdan
DiffM
61
0
0
18 Sep 2025
Efficient Multimodal Dataset Distillation via Generative Models
Efficient Multimodal Dataset Distillation via Generative Models
Zhenghao Zhao
Haoxuan Wang
Junyi Wu
Yuzhang Shang
Gaowen Liu
Yan Yan
DD
211
0
0
18 Sep 2025
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook
Peng Xu
Shengwu Xiong
Jiajun Zhang
Yaxiong Chen
Bowen Zhou
...
Yang Yang
Yanglin Deng
Yashu Kang
Ye Yuan
Y. Wen
LRM
91
1
0
17 Sep 2025
Evaluating Robustness of Vision-Language Models Under Noisy Conditions
Evaluating Robustness of Vision-Language Models Under Noisy Conditions
Purushoth
Alireza
AAML
76
0
0
15 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
236
1
0
12 Sep 2025
Recurrence Meets Transformers for Universal Multimodal Retrieval
Recurrence Meets Transformers for Universal Multimodal Retrieval
Davide Caffagni
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
120
1
0
10 Sep 2025
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding
Jiangnan Xie
Xiaolong Zheng
Liang Zheng
ObjD
129
0
0
08 Sep 2025
Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos
Integrating Spatial and Semantic Embeddings for Stereo Sound Event Localization in Videos
Davide Berghi
Philip J. B. Jackson
68
0
0
08 Sep 2025
Effectively obtaining acoustic, visual and textual data from videos
Effectively obtaining acoustic, visual and textual data from videos
Jorge E. León
Miguel Carrasco
VGen
111
1
0
06 Sep 2025
Semantic-guided LoRA Parameters Generation
Semantic-guided LoRA Parameters Generation
Miaoge Li
Yang Chen
Zhijie Rao
Can Jiang
Jingcai Guo
OffRL
80
0
0
05 Sep 2025
Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Reina Ishikawa
Ryo Fujii
Hideo Saito
Ryo Hachiuma
108
0
0
03 Sep 2025
EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic Captions
EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic Captions
Dinh-Khoi Vo
Van-Loc Nguyen
M. Tran
T. Le
3DVVGen
44
0
0
31 Aug 2025
VoCap: Video Object Captioning and Segmentation from Any Prompt
VoCap: Video Object Captioning and Segmentation from Any Prompt
J. Uijlings
Xingyi Zhou
Xiuye Gu
Arsha Nagrani
Anurag Arnab
Alireza Fathi
David A. Ross
Cordelia Schmid
VOSVLM
188
1
0
29 Aug 2025
Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval
Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval
Jonghyun Song
Youngjune Lee
Gyu-Hwung Cho
Ilhyeon Song
Saehun Kim
Yohan Jo
VLM
52
0
0
22 Aug 2025
RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution
RAGSR: Regional Attention Guided Diffusion for Image Super-Resolution
Haodong He
Y. Bai
Rui Lan
Xu Duan
Lei Sun
Xiangxiang Chu
Gui-Song Xia
DiffM
70
1
0
22 Aug 2025
Towards Open World Detection: A Survey
Towards Open World Detection: A Survey
Andrei-Stefan Bulzan
Cosmin Cernazanu-Glavan
ObjDVLM
159
0
0
22 Aug 2025
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Shanlin Sun
Yifan Wang
Hanwen Zhang
Yifeng Xiong
Qin Ren
Ruogu Fang
Xiaohui Xie
Chenyu You
126
2
0
20 Aug 2025
Understanding Data Influence with Differential Approximation
Understanding Data Influence with Differential Approximation
Haoru Tan
Sitong Wu
Xiuzhe Wu
Wang Wang
Bo Zhao
Zeke Xie
Gui-Song Xia
Xiaojuan Qi
TDI
206
1
0
20 Aug 2025
7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models
7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models
Elena Izzo
Luca Parolari
Davide Vezzaro
Lamberto Ballan
52
0
0
18 Aug 2025
Region-Level Context-Aware Multimodal Understanding
Region-Level Context-Aware Multimodal Understanding
Hongliang Wei
Xianqi Zhang
Xingtao Wang
Xiaopeng Fan
Debin Zhao
VLM
125
0
0
17 Aug 2025
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models
Yuchen Zhou
Jiayu Tang
Shuo Yang
Xiaoyan Xiao
Yuqin Dai
Wenhao Yang
Chao Gou
Xiaobo Xia
Tat-Seng Chua
VLMCoGeLRM
109
1
0
15 Aug 2025
JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics
JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics
Simindokht Jahangard
Mehrzad Mohammadi
Yi Shen
Zhixi Cai
Hamid Rezatofighi
225
1
0
14 Aug 2025
Bridging Modality Gaps in e-Commerce Products via Vision-Language Alignment
Bridging Modality Gaps in e-Commerce Products via Vision-Language Alignment
Yipeng Zhang
Hongju Yu
Aritra Mandal
Canran Xu
Qunzhi Zhou
Zhe Wu
132
0
0
13 Aug 2025
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu
Zhibo Yang
Yuliang Liu
Xiang Bai
MLLMOffRLLRM
64
3
0
12 Aug 2025
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Weitai Kang
Weiming Zhuang
Zhizhong Li
Yan Yan
Lingjuan Lyu
82
0
0
11 Aug 2025
MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark
MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark
Haiyang Guo
Fei Zhu
Hongbo Zhao
Fanhu Zeng
Wenzhuo Liu
Shijie Ma
Da-Han Wang
Xu-Yao Zhang
CLL
166
2
0
10 Aug 2025
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
Zhangquan Chen
Ruihui Zhao
Chuwei Luo
Mingze Sun
Xinlei Yu
Yangyang Kang
Ruqi Huang
LRM
177
4
0
08 Aug 2025
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Adapting Vision-Language Models Without Labels: A Comprehensive Survey
Hao Dong
Lijun Sheng
Jian Liang
Ran He
Eleni Chatzi
Olga Fink
OffRLVLM
164
3
0
07 Aug 2025
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval
Y. Wang
Tao Wang
Chenwei Tang
Caiyang Yu
Zhengqing Zang
Mengmi Zhang
Shudong Huang
Jiancheng Lv
VLM
87
0
0
06 Aug 2025
ChartCap: Mitigating Hallucination of Dense Chart Captioning
ChartCap: Mitigating Hallucination of Dense Chart Captioning
Junyoung Lim
Jaewoo Ahn
Gunhee Kim
88
1
0
05 Aug 2025
VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction Editing Data and Long Captions
VITRIX-CLIPIN: Enhancing Fine-Grained Visual Understanding in CLIP via Instruction Editing Data and Long Captions
Ziteng Wang
Siqi Yang
Limeng Qiao
Lin Ma
VLM
221
0
0
04 Aug 2025
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment
Dahun Kim
A. Angelova
VLM
163
0
0
03 Aug 2025
Eigen Neural Network: Unlocking Generalizable Vision with Eigenbasis
Eigen Neural Network: Unlocking Generalizable Vision with Eigenbasis
Anzhe Cheng
Chenzhong Yin
Mingxi Cheng
Shukai Duan
Shahin Nazarian
Paul Bogdan
176
0
0
02 Aug 2025
Session-Based Recommendation with Validated and Enriched LLM Intents
Session-Based Recommendation with Validated and Enriched LLM Intents
G. G. Lee
Y. Liu
Yifan Liu
Susik Yoon
Dong Wang
SeongKu Kang
155
2
0
01 Aug 2025
Multimodal Referring Segmentation: A Survey
Multimodal Referring Segmentation: A Survey
Henghui Ding
Song Tang
Shuting He
Chang-rui Liu
Zuxuan Wu
Yu-Gang Jiang
306
10
0
01 Aug 2025
Improving Multimodal Contrastive Learning of Sentence Embeddings with Object-Phrase Alignment
Improving Multimodal Contrastive Learning of Sentence Embeddings with Object-Phrase Alignment
Kaiyan Zhao
Zhongtao Miao
Yoshimasa Tsuruoka
74
1
0
01 Aug 2025
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
Hyundong Jin
Hyung Jin Chang
Eunwoo Kim
VLM
85
0
0
01 Aug 2025
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko
Ji Soo Lee
M. Choi
Zihang Meng
Hyunwoo J. Kim
236
1
0
31 Jul 2025
On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations
On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations
Jordan Vice
Naveed Akhtar
Yansong Gao
Richard Hartley
Ajmal Mian
AAML
155
1
0
30 Jul 2025
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang
Binzhu Xie
Zhonghao Yan
Yuli Zhang
Donghao Zhou
Xiaofei Chen
Shi Qiu
Jiaqi Liu
Guoyang Xie
Zhichao Lu
115
2
0
29 Jul 2025
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao
Yannian Fu
Weiqun Wu
Haixiao Yue
Shanshan Liu
Gang Zhang
MLLMLRM
169
1
0
29 Jul 2025
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Meishan Zhang
Xin Zhang
X. Zhao
Shouzheng Huang
Baotian Hu
Min Zhang
169
3
0
28 Jul 2025
ZSE-Cap: A Zero-Shot Ensemble for Image Retrieval and Prompt-Guided Captioning
ZSE-Cap: A Zero-Shot Ensemble for Image Retrieval and Prompt-Guided Captioning
Duc-Tai Dinh
Duc Anh Khoa Dinh
VLM
56
0
0
28 Jul 2025
Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation
Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation
Xinshu Li
Ruoyu Wang
Erdun Gao
Mingming Gong
Lina Yao
DiffM
127
0
0
26 Jul 2025
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
Yehao Lu
Minghe Weng
Zekang Xiao
Rui Jiang
Wei Su
Guangcong Zheng
Ping Lu
Xi Li
MoEObjD
116
0
0
23 Jul 2025
Previous
12345...252627
Next