ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,325 papers shown
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang
Binzhu Xie
Zhonghao Yan
Yuli Zhang
Donghao Zhou
Xiaofei Chen
Shi Qiu
Jiaqi Liu
Guoyang Xie
Zhichao Lu
163
2
0
29 Jul 2025
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao
Yannian Fu
Weiqun Wu
Haixiao Yue
Shanshan Liu
Gang Zhang
MLLMLRM
275
1
0
29 Jul 2025
ZSE-Cap: A Zero-Shot Ensemble for Image Retrieval and Prompt-Guided Captioning
ZSE-Cap: A Zero-Shot Ensemble for Image Retrieval and Prompt-Guided Captioning
Duc-Tai Dinh
Duc Anh Khoa Dinh
VLM
87
0
0
28 Jul 2025
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Meishan Zhang
Xin Zhang
X. Zhao
Shouzheng Huang
Baotian Hu
Min Zhang
267
3
0
28 Jul 2025
Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation
Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation
Xinshu Li
Ruoyu Wang
Erdun Gao
Mingming Gong
Lina Yao
DiffM
189
0
0
26 Jul 2025
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
Yehao Lu
Minghe Weng
Zekang Xiao
Rui Jiang
Wei Su
Guangcong Zheng
Ping Lu
Xi Li
MoEObjD
167
2
0
23 Jul 2025
ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension
Yizhi Hu
Zezhao Tian
Xingqun Qi
Chen Su
Bingkun Yang
Junhui Yin
Muyi Sun
Man Zhang
Zhenan Sun
ObjD
149
0
0
22 Jul 2025
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
Xiaojie Li
Chu Li
Shi-Zhe Chen
Xi Chen
OffRL
251
3
0
20 Jul 2025
FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
FIX-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
Bingchao Wang
Zhiwei Ning
Jianyu Ding
Xuanang Gao
Yin Li
Dongsheng Jiang
J. Yang
Wei Liu
CLIPVLM
234
7
0
14 Jul 2025
PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning
PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning
Yibo Lyu
Rui Shao
Gongwei Chen
Yijie Zhu
Weili Guan
Liqiang Nie
236
9
0
10 Jul 2025
With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You
With Limited Data for Multimodal Alignment, Let the STRUCTURE Guide You
Fabian Gröger
Shuo Wen
Huyen Le
Maria Brbic
262
1
0
20 Jun 2025
Control and Realism: Best of Both Worlds in Layout-to-Image without Training
Control and Realism: Best of Both Worlds in Layout-to-Image without Training
Bonan li
Yinhan Hu
Songhua Liu
Xinchao Wang
DiffM
240
2
0
18 Jun 2025
GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models
GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models
Ruiguang Pei
W. Sun
Zhihui Fu
Jun Wang
VLM
112
0
0
16 Jun 2025
CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation
CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical OperationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Naihao Deng
Kapotaksha Das
Amélie Reymond
Vitaliy Popov
M. Abouelenien
230
0
0
15 Jun 2025
On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval
On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval
Seongbo Jang
Seonghyeon Lee
Dongha Lee
Hwanjo Yu
195
0
0
13 Jun 2025
Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise
Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise
Chuan He
Zhaosong Lu
Defeng Sun
Zhanwang Deng
172
10
0
12 Jun 2025
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
Pranav Guruprasad
Yangyue Wang
Sudipta Chowdhury
Jaewoo Song
Harshvardhan Sikka
243
0
0
10 Jun 2025
Hybrid Reasoning for Perception, Explanation, and Autonomous Action in Manufacturing
Christos Margadji
Sebastian W. Pattinson
AI4CE
115
1
0
10 Jun 2025
Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Ruiyang Zhang
Hu Zhang
Hao Fei
Zhedong Zheng
UQCV
285
0
0
09 Jun 2025
Synthetic Visual Genome
Synthetic Visual GenomeComputer Vision and Pattern Recognition (CVPR), 2025
J. S. Park
Zixian Ma
Linjie Li
Chenhao Zheng
Cheng-Yu Hsieh
...
Quan Kong
Norimasa Kobori
Ali Farhadi
Yejin Choi
Ranjay Krishna
224
0
0
09 Jun 2025
CoMemo: LVLMs Need Image Context with Image Memory
CoMemo: LVLMs Need Image Context with Image Memory
Shi-Qi Liu
Weijie Su
Xizhou Zhu
Wenhai Wang
Jifeng Dai
VLM
219
0
0
06 Jun 2025
DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models
DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models
Jiarui Wang
Huiyu Duan
Juntong Wang
Ziheng Jia
Woo Yi Yang
...
Yu Zhao
Jiaying Qian
Yuke Xing
Guangtao Zhai
Xiongkuo Min
253
3
0
03 Jun 2025
R2SM: Referring and Reasoning for Selective Masks
R2SM: Referring and Reasoning for Selective Masks
Yu-Lin Shih
Wei-En Tai
Cheng Sun
Y. Wang
Hwann-Tzong Chen
ISeg
352
0
0
02 Jun 2025
Data Pruning by Information Maximization
Data Pruning by Information MaximizationInternational Conference on Learning Representations (ICLR), 2025
Haoru Tan
Sitong Wu
Wei Huang
Shizhen Zhao
Xiaojuan Qi
331
8
0
02 Jun 2025
Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
Ying Yang
Jie Zhang
Xiao Lv
Di Lin
Tao Xiang
Qing Guo
AAMLVLM
167
0
0
30 May 2025
Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning
Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning
Amit Peleg
Naman D. Singh
Matthias Hein
CoGeVLM
367
2
0
30 May 2025
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Yaxin Luo
Zhaoyi Li
Jiacheng Liu
Jiacheng Cui
Xiaohan Zhao
Zhiqiang Shen
LLMAGLRMVLM
283
7
0
30 May 2025
Benchmarking Foundation Models for Zero-Shot Biometric Tasks
Benchmarking Foundation Models for Zero-Shot Biometric Tasks
Redwan Sony
Parisa Farmanifard
Hamzeh Alzwairy
Nitish Shukla
Arun Ross
CVBMVLM
267
5
0
30 May 2025
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Junyu Luo
Zhizhuo Kou
Liming Yang
Xiao Luo
Jinsheng Huang
...
Jiaming Ji
Xuanzhe Liu
Sirui Han
Ming Zhang
Wenhan Luo
199
14
0
30 May 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Argus: Vision-Centric Reasoning with Grounded Chain-of-ThoughtComputer Vision and Pattern Recognition (CVPR), 2025
Yunze Man
De-An Huang
Guilin Liu
Shiwei Sheng
Shilong Liu
Liang-Yan Gui
Jan Kautz
Yu Wang
Zhiding Yu
MLLMLRM
335
19
0
29 May 2025
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration
Wenju Sun
Qingyong Li
Wen Wang
Yang Liu
Yangli-ao Geng
Boyang Li
MoMe
312
2
0
29 May 2025
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Shurong Zheng
Fan Yang
Ming Tang
Jinqiao Wang
VLMLRM
279
1
0
27 May 2025
Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms
Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms
Yuanzhe Peng
Jieming Bian
Lei Wang
Yin Huang
Jie Xu
211
1
0
27 May 2025
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Yunlong Tang
Pinxin Liu
Mingqian Feng
Mingqian Feng
Rui Mao
...
Hang Hua
Ali Vosoughi
Luchuan Song
Zeliang Zhang
Chenliang Xu
LRM
473
4
0
26 May 2025
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang
Lingling Zhang
Jie Ma
Han Lai
Fangzhi Xu
Yifei Li
Wenjun Wu
Yaqiang Wu
Jun Liu
LRM
281
5
0
25 May 2025
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLM
239
0
0
24 May 2025
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
Zhenglin Huang
Tianxiao Li
Xiangtai Li
Haiquan Wen
Yiwei He
...
Hao Fei
Xi Yang
Xiaowei Huang
Bei Peng
Guangliang Cheng
711
6
0
24 May 2025
EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2025
G. MEng
Sunan He
Jinpeng Wang
Tao Dai
Letian Zhang
Jieming Zhu
Qing Li
Gang Wang
Rui Zhang
Yong Jiang
VLM
471
5
0
24 May 2025
Reasoning Segmentation for Images and Videos: A Survey
Reasoning Segmentation for Images and Videos: A Survey
Yiqing Shen
Chenjia Li
Fei Xiong
Jeong-O Jeong
Tianpeng Wang
Michael Latman
Mathias Unberath
VOS
430
9
0
24 May 2025
Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation
Zhihua Liu
Amrutha Saseendran
Lei Tong
Xilin He
Fariba Yousefi
...
Dino Oglic
Tom Diethe
Philip Teare
Huiyu Zhou
Chen Jin
VLM
608
3
0
23 May 2025
Learning Shared Representations from Unpaired Data
Learning Shared Representations from Unpaired Data
Amitai Yacobi
Nir Ben-Ari
Ronen Talmon
Uri Shaham
SSL
295
0
0
23 May 2025
Instructify: Demystifying Metadata to Visual Instruction Tuning Data Conversion
Jacob A. Hansen
Wei Lin
Junmo Kang
M. Jehanzeb Mirza
Hongyin Luo
Rogerio Feris
Alan Ritter
James R. Glass
Leonid Karlinsky
VLM
447
1
0
23 May 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Xuetao Zhang
LRM
531
18
0
18 May 2025
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings
UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings
Jiajun Qin
Yuan Pu
Zhuolun He
Seunggeun Kim
David Z. Pan
Bei Yu
363
3
0
17 May 2025
GeoMM: On Geodesic Perspective for Multi-modal Learning
GeoMM: On Geodesic Perspective for Multi-modal LearningComputer Vision and Pattern Recognition (CVPR), 2025
Shibin Mei
Hang Wang
Bingbing Ni
317
0
0
16 May 2025
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic StructuresAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Shun Inadumi
Nobuhiro Ueda
Koichiro Yoshino
ObjD
356
0
0
16 May 2025
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Raghuveer Thirukovalluru
Rui Meng
Wenshu Fan
Karthikeyan K
Mingyi Su
Ping Nie
Semih Yavuz
Yingbo Zhou
Lei Ma
Bhuwan Dhingra
356
12
0
16 May 2025
Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Yifan Wu
Lutao Yan
Yizhang Zhu
Yinan Mei
Jiannan Wang
Nan Tang
Yuyu Luo
539
4
0
15 May 2025
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
Wenju Sun
Qingyong Li
Yangli-ao Geng
Boyang Li
MoMe
329
6
0
11 May 2025
TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for Documentaries
TopicVD: A Topic-Based Dataset of Video-Guided Multimodal Machine Translation for DocumentariesInternational Conference on Applications of Natural Language to Data Bases (NLDB), 2025
Jinze Lv
Jian Chen
Zi Long
Xianghua Fu
Yin Chen
VGen
328
0
0
09 May 2025
Previous
123456...252627
Next
Page 3 of 27
Pageof 27