Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1908.03557
Cited By
VisualBERT: A Simple and Performant Baseline for Vision and Language
9 August 2019
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VisualBERT: A Simple and Performant Baseline for Vision and Language"
50 / 1,256 papers shown
Title
Privacy-Shielded Image Compression: Defending Against Exploitation from Vision-Language Pretrained Models
Xuelin Shen
Jiayin Xu
Kangsheng Yin
Wenhan Yang
AAML
193
0
0
18 Jun 2025
Segmenting Visuals With Querying Words: Language Anchors For Semi-Supervised Image Segmentation
Numair Nadeem
Saeed Anwar
Muhammad Asad
Abdul Bais
VLM
188
0
0
16 Jun 2025
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Xiao Xu
L. Qin
Wanxiang Che
Min-Yen Kan
MoE
VLM
240
0
0
13 Jun 2025
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
Haotian Ni
Yake Wei
Hang Liu
Gong Chen
Chong Peng
Hao Lin
Di Hu
OffRL
238
1
0
13 Jun 2025
Vision Generalist Model: A Survey
International Journal of Computer Vision (IJCV), 2025
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
257
0
0
11 Jun 2025
Multimodal Representation Alignment for Cross-modal Information Retrieval
Fan Xu
Luis A. Leiva
171
1
0
10 Jun 2025
OpenFace 3.0: A Lightweight Multitask System for Comprehensive Facial Behavior Analysis
IEEE International Conference on Automatic Face & Gesture Recognition (FG), 2025
Jiewen Hu
Leena Mathur
Paul Pu Liang
Louis-Philippe Morency
CVBM
149
1
0
03 Jun 2025
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
Xiaojun Shan
Qi Cao
Xing Han
Haofei Yu
Paul Liang
228
1
0
02 Jun 2025
What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning
Zhaotian Weng
Haoxuan Li
Kuan-Hao Huang
Jieyu Zhao
LRM
CoGe
142
0
0
01 Jun 2025
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Junyu Luo
Zhizhuo Kou
Liming Yang
Xiao Luo
Jinsheng Huang
...
Jiaming Ji
Xuanzhe Liu
Sirui Han
Ming Zhang
Wenhan Luo
130
13
0
30 May 2025
Multi-MLLM Knowledge Distillation for Out-of-Context News Detection
Yimeng Gu
Zhao Tong
Ignacio Castro
Shu Wu
Gareth Tyson
120
1
0
28 May 2025
LifeIR at the NTCIR-18 Lifelog-6 Task
NTCIR Conference on Evaluation of Information Access Technologies (NTCIR), 2025
Jiahan Chen
Da Li
Keping Bi
135
1
0
27 May 2025
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
Matthew Lisondra
B. Benhabib
G. Nejat
LM&Ro
187
1
0
26 May 2025
Multi-modal brain encoding models for multi-modal stimuli
International Conference on Learning Representations (ICLR), 2025
R. Mamidi
Khushbu Pahwa
Mounika Marreddy
Maneesh Singh
Subba Reddy Oota
Bapi S. Raju
124
8
0
26 May 2025
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection
IEEE Transactions on Artificial Intelligence (IEEE TAI), 2025
Md. Mithun Hossain
Md. Shakil Hossain
Sudipto Chaki
M. F. Mridha
392
0
0
25 May 2025
Visual Question Answering on Multiple Remote Sensing Image Modalities
Hichem Boussaid
Lucrezia Tosato
F. Weissgerber
Camille Kurtz
Laurent Wendling
Sylvain Lobry
145
3
0
21 May 2025
Domain Adaptation of VLM for Soccer Video Understanding
Tiancheng Jiang
Henry Wang
Md Sirajus Salekin
Parmida Atighehchian
Shinan Zhang
VLM
334
3
0
20 May 2025
TS-VLM: Text-Guided SoftSort Pooling for Vision-Language Models in Multi-View Driving Reasoning
Lihong Chen
Hossein Hassani
Soodeh Nikan
VLM
268
3
0
19 May 2025
Hyperspectral Image Land Cover Captioning Dataset for Vision Language Models
Aryan Das
Tanishq Rachamalla
Pravendra Singh
Koushik Biswas
Vinay Kumar Verma
Swalpa Kumar Roy
VLM
221
0
0
18 May 2025
Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables
Yu Gui
Cong Ma
Zongming Ma
SSL
260
1
0
18 May 2025
GeoMM: On Geodesic Perspective for Multi-modal Learning
Computer Vision and Pattern Recognition (CVPR), 2025
Shibin Mei
Hang Wang
Bingbing Ni
269
0
0
16 May 2025
Exploring Implicit Visual Misunderstandings in Multimodal Large Language Models through Attention Analysis
Pengfei Wang
Guohai Xu
Weinong Wang
Junjie Yang
Jie Lou
Yunhua Xue
294
1
0
15 May 2025
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Yiran Chen
Yuan Yao
Tong Zhang
Heng Ji
VLM
275
1
0
13 May 2025
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models
Conference on Uncertainty in Artificial Intelligence (UAI), 2025
Aishwarya Venkataramanan
P. Bodesheim
Joachim Denzler
BDL
VLM
359
2
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
999
26
0
05 May 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
311
0
0
30 Apr 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
425
11
0
29 Apr 2025
A Survey of Task-Oriented Knowledge Graph Reasoning: Status, Applications, and Prospects
Guanglin Niu
Bo Li
Yangguang Lin
LRM
220
1
0
27 Apr 2025
Multimodal graph representation learning for website generation based on visual sketch
Tung D. Vu
Chung Hoang
Truong-Son Hy
3DV
245
0
0
25 Apr 2025
ShapeSpeak: Body Shape-Aware Textual Alignment for Visible-Infrared Person Re-Identification
Shuanglin Yan
Neng Dong
Shuang Li
Rui Yan
Hao Tang
Jing Qin
897
0
0
25 Apr 2025
A Genealogy of Foundation Models in Remote Sensing
Kevin Lane
Morteza Karimzadeh
270
1
0
24 Apr 2025
Detecting and Understanding Hateful Contents in Memes Through Captioning and Visual Question-Answering
International Conference on Conceptual Structures (ICCS), 2025
Ali Anaissi
Junaid Akram
Kunal Chaturvedi
Ali Braytee
209
1
0
23 Apr 2025
FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
Hariseetharam Gunduboina
Muhammad Haris Khan
Biplab Banerjee
VLM
235
0
0
23 Apr 2025
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Songtao Jiang
Yuan Wang
Sibo Song
Yanzhe Zhang
Zijie Meng
Bohan Lei
Jian Wu
Jimeng Sun
Zuozhu Liu
MedIm
VLM
199
10
0
20 Apr 2025
TSAL: Few-shot Text Segmentation Based on Attribute Learning
Chenming Li
Chengxu Liu
Yuanting Fan
Xiao Jin
Xingsong Hou
Xueming Qian
VLM
259
0
0
15 Apr 2025
HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection With Multichannel Audio and Multiscale Visual Cues
Xiwen Li
Ross T. Whitaker
Tolga Tasdizen
190
0
0
15 Apr 2025
DiffusionCom: Structure-Aware Multimodal Diffusion Model for Multimodal Knowledge Graph Completion
Wei Huang
M. Liang
Peining Li
Xu Hou
Yawen Li
Junping Du
Zhe Xue
Zeli Guan
DiffM
203
0
0
09 Apr 2025
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging
International Journal of Machine Learning and Cybernetics (IJMLC), 2025
Siyuan Dai
Kai Ye
Guodong Liu
Haoteng Tang
Chen Tang
MedIm
171
3
0
09 Apr 2025
A Lightweight Large Vision-language Model for Multimodal Medical Images
Belal Alsinglawi
Chris McCarthy
Sara Webb
Christopher Fluke
Navid Toosy Saidy
LM&MA
217
0
0
08 Apr 2025
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Runnan Fang
Xiaobin Wang
Yuan Liang
Shuofei Qiao
Jialong Wu
...
Ningyu Zhang
Yong Jiang
Pengjun Xie
Fei Huang
Zeyang Zhang
LLMAG
403
6
0
04 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
Computer Vision and Pattern Recognition (CVPR), 2025
Yuejiao Su
Yi Wang
Qiongyang Hu
Chuang Yang
Lap-Pui Chau
192
4
0
02 Apr 2025
FedMM-X: A Trustworthy and Interpretable Framework for Federated Multi-Modal Learning in Dynamic Environments
Sree Bhargavi Balija
FedML
162
4
0
25 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yi Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLM
VLM
LRM
351
13
0
25 Mar 2025
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Ziming Wei
Bingqian Lin
Yunshuang Nie
Jiaqi Chen
Shikui Ma
Hang Xu
Xiaodan Liang
420
3
0
23 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Computer Vision and Pattern Recognition (CVPR), 2025
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
237
4
0
21 Mar 2025
A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
Pengyu Liu
Guohua Dong
D. Guo
Kun Li
Fengling Li
Xun Yang
Meng Wang
Xiaomin Ying
AI4CE
222
5
0
20 Mar 2025
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
IEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2025
Jiadong Wang
Weiwei Song
Hao Chen
Jie Ren
Huimin Zhao
292
3
0
18 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
366
8
0
13 Mar 2025
Anatomy-Aware Conditional Image-Text Retrieval
Meng Zheng
Jiajin Zhang
Benjamin Planche
Zhongpai Gao
Terrence Chen
Ziyan Wu
MedIm
199
0
0
10 Mar 2025
Exploring Multimodal Perception in Large Language Models Through Perceptual Strength Ratings
IEEE Access (IEEE Access), 2025
Jonghyun Lee
Dojun Park
Jiwoo Lee
Hoekeon Choi
Sung-Eun Lee
236
4
0
10 Mar 2025
Previous
1
2
3
4
5
...
24
25
26
Next