Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1505.04870
Cited By
v1
v2
v3
v4 (latest)
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"
50 / 1,325 papers shown
OS-W2S: An Automatic Labeling Engine for Language-Guided Open-Set Aerial Object Detection
Guoting Wei
Yu Liu
Xia Yuan
Xizhe Xue
Linlin Guo
Yifan Yang
Chunxia Zhao
Zongwen Bai
Haokui Zhang
Rong Xiao
ObjD
336
2
0
06 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
1.1K
30
0
05 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities
Madhukar Reddy Vongala
Saurabh Srivastava
Jana Kosecka
CLIP
CoGe
VLM
217
0
0
04 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
International Conference on Learning Representations (ICLR), 2025
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Liang Luo
Tao Jin
DiffM
660
6
0
30 Apr 2025
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection
Jianbo Gao
Keke Gai
Jing Yu
Liehuang Zhu
Qi Wu
AAML
263
1
0
28 Apr 2025
What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift
Jiamin Chang
Haoyang Li
Hammond Pearce
Ruoxi Sun
Yue Liu
Minhui Xue
319
0
0
28 Apr 2025
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLM
LRM
423
14
0
25 Apr 2025
Decoupled Global-Local Alignment for Improving Compositional Understanding
Xiaoxing Hu
Kaicheng Yang
Chao Guo
Haoran Xu
Ziyong Feng
Longji Xu
VLM
701
7
0
23 Apr 2025
Progressive Language-guided Visual Learning for Multi-Task Visual Grounding
Jingchao Wang
Hong Wang
Wenlong Zhang
Kunhua Ji
Dingjiang Huang
Yefeng Zheng
ObjD
367
3
0
22 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Computer Vision and Pattern Recognition (CVPR), 2025
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
315
18
0
20 Apr 2025
POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation
ACM Symposium on User Interface Software and Technology (UIST), 2025
Evans Xu Han
Alice Qian Zhang
Haiyi Zhu
Haiyi Zhu
Paul Pu Liang
Jane Hsieh
408
3
0
18 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
664
107
0
17 Apr 2025
PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Weinan Zhang
Ju Jia
Yang Liu
Yihao Huang
Xuzhao Li
Cong Wu
Lina Wang
AAML
277
1
0
15 Apr 2025
UP-Person: Unified Parameter-Efficient Transfer Learning for Text-based Person Retrieval
Yating Liu
Yaowei Li
Xiangyuan Lan
Wenming Yang
Zimo Liu
Q. Liao
275
4
0
14 Apr 2025
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Computer Vision and Pattern Recognition (CVPR), 2025
Jiansheng Li
Xingxuan Zhang
Hao Zou
Yige Guo
Renzhe Xu
Yilong Liu
Chuzhao Zhu
Yue He
Peng Cui
VLM
293
1
0
14 Apr 2025
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
Yijun Liang
Ming Li
Chenrui Fan
Ziyue Li
Dang Nguyen
Kwesi Cobbina
Shweta Bhardwaj
Jiuhai Chen
Fuxiao Liu
Tianyi Zhou
VLM
CoGe
374
12
0
10 Apr 2025
Towards Visual Text Grounding of Multimodal Large Language Model
Ming Li
Ruiyi Zhang
Jian Chen
Jiuxiang Gu
Jiuxiang Gu
Franck Dernoncourt
Wanrong Zhu
Wanrong Zhu
Tianyi Zhou
Tong Sun
435
12
0
07 Apr 2025
URECA: Unique Region Caption Anything
Sangbeom Lim
J. Kim
Heeji Yoon
Jaewoo Jung
Seungryong Kim
284
1
0
07 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
339
3
0
02 Apr 2025
ORAL: Prompting Your Large-Scale LoRAs via Conditional Recurrent Diffusion
Rana Muhammad Shahroz Khan
Dongwen Tang
Pingzhi Li
Xiaojiang Peng
Tianlong Chen
AI4CE
1.1K
1
0
31 Mar 2025
Unicorn: Text-Only Data Synthesis for Vision Language Model Training
Xiaomin Yu
Pengxiang Ding
Donglin Wang
Siteng Huang
Songyang Gao
Chengwei Qin
Kejian Wu
Zhaoxin Fan
Ziyue Qiao
Donglin Wang
MLLM
SyDa
240
2
0
28 Mar 2025
MAVERIX: Multimodal Audio-Visual Evaluation and Recognition IndeX
Liuyue Xie
George Z. Wei
Avik Kuthiala
Ce Zheng
Ananya Bal
...
Rohan Choudhury
Morteza Ziyadi
Xu Zhang
Hao Yang
László A. Jeni
313
1
0
27 Mar 2025
Faster Parameter-Efficient Tuning with Token Redundancy Reduction
Computer Vision and Pattern Recognition (CVPR), 2025
Kwonyoung Kim
Jungin Park
Jin-Hwa Kim
Hyeongjun Kwon
Kwanghoon Sohn
470
4
0
26 Mar 2025
Unified Multimodal Discrete Diffusion
Alexander Swerdlow
Mihir Prabhudesai
Siddharth Gandhi
Deepak Pathak
Katerina Fragkiadaki
DiffM
331
23
0
26 Mar 2025
VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs
Kelaiti Xiao
Liang Yang
Paerhati Tulajiang
Hongfei Lin
Hongfei Lin
MLLM
367
0
0
25 Mar 2025
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
Yang Liu
Wentao Feng
Zhuoyao Liu
Shudong Huang
Jiancheng Lv
DiffM
VLM
389
1
0
19 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
435
9
0
19 Mar 2025
Text-Guided Image Invariant Feature Learning for Robust Image Watermarking
Muhammad Ahtesham
Xin Zhong
236
1
0
18 Mar 2025
Survey of Adversarial Robustness in Multimodal Large Language Models
Chengze Jiang
Zhuangzhuang Wang
Minjing Dong
Jie Gui
AAML
331
9
0
18 Mar 2025
Federated Continual Instruction Tuning
Haiyang Guo
Fanhu Zeng
Fei Zhu
Wenzhuo Liu
Da-Han Wang
Jian Xu
Xu-Yao Zhang
Cheng-Lin Liu
CLL
FedML
519
6
0
17 Mar 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Computer Vision and Pattern Recognition (CVPR), 2025
Hao Yin
Guangzong Si
Zilei Wang
225
6
0
17 Mar 2025
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Ding Wang
Botian Shi
365
12
0
17 Mar 2025
Grounded Chain-of-Thought for Multimodal Large Language Models
Qiong Wu
Xiangcong Yang
Weihao Ye
Chenxin Fang
Baiyang Song
Xiaoshuai Sun
Rongrong Ji
LRM
455
23
0
17 Mar 2025
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Haiyang Guo
Fanhu Zeng
Ziwei Xiang
Fei Zhu
Da-Han Wang
Xu-Yao Zhang
Cheng-Lin Liu
386
10
0
17 Mar 2025
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
292
1
0
17 Mar 2025
Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework
Zhuo Zhi
Chen Feng
Adam Daneshmend
Mine Orlu
Andreas Demosthenous
L. Yin
Da Li
Ziquan Liu
Miguel R. D. Rodrigues
LRM
263
8
0
11 Mar 2025
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
932
12
0
11 Mar 2025
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
Bozhi Luan
Wengang Zhou
Hao Feng
Zhe Wang
Xiaosong Li
Haoyang Li
VLM
314
1
0
11 Mar 2025
Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment
AAAI Conference on Artificial Intelligence (AAAI), 2025
Yang Liu
M. Liu
Shudong Huang
Jiancheng Lv
251
6
0
10 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRL
LRM
442
23
0
10 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLM
ObjD
207
0
0
10 Mar 2025
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
Jiawei Han
Guiguang Ding
VLM
ObjD
542
33
0
10 Mar 2025
Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving
Enming Zhang
Peizhe Gong
Xingyuan Dai
Yisheng Lv
Yisheng Lv
Qinghai Miao
MLLM
ELM
329
4
0
09 Mar 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
271
0
0
04 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2025
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
546
42
0
04 Mar 2025
Are Large Vision Language Models Good Game Players?
International Conference on Learning Representations (ICLR), 2025
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
ELM
LRM
245
13
0
04 Mar 2025
Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Jia Chen
Qian Dong
Haitao Li
Xiaohui He
Yan Gao
...
Ping Yang
Chen Xu
Yao Hu
Jiaxin Mao
Yixiao Liu
200
4
0
01 Mar 2025
ABC: Achieving Better Control of Multimodal Embeddings using VLMs
Benjamin Schneider
Florian Kerschbaum
Wenhu Chen
962
0
0
01 Mar 2025
RTGen: Real-Time Generative Detection Transformer
Chi Ruan
Jiying Zhao
Wenhu Chen
ObjD
VLM
415
0
0
28 Feb 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Ziyi Wang
Yang Zhou
ELM
VLM
LRM
412
2
0
27 Feb 2025
Previous
1
2
3
4
5
...
25
26
27
Next