Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,232 papers shown
Title
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
941
0
0
11 Apr 2025
TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs
Zijian Zhang
Xuhui Zheng
X. Wu
Chong Peng
Xuezhi Cao
186
3
0
10 Apr 2025
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging
International Journal of Machine Learning and Cybernetics (IJMLC), 2025
Siyuan Dai
Kai Ye
Guodong Liu
Haoteng Tang
Chen Tang
MedIm
195
4
0
09 Apr 2025
Locations of Characters in Narratives: Andersen and Persuasion Datasets
Batuhan Ozyurt
Roya Arkhmammadova
Deniz Yuret
150
4
0
04 Apr 2025
Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles
Chen Wei Kuo
Kevin Chu
Nouar Aldahoul
Hazem Ibrahim
Talal Rahwan
Yasir Zaki
SyDa
377
0
0
04 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Information Fusion (Inf. Fusion), 2025
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
395
37
0
03 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
International Journal of Computer Vision (IJCV), 2024
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
351
1
0
03 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
Computer Vision and Pattern Recognition (CVPR), 2025
Yuejiao Su
Yi Wang
Qiongyang Hu
Chuang Yang
Lap-Pui Chau
224
4
0
02 Apr 2025
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
Alexander Vogel
Omar Moured
Yufan Chen
Kailai Li
Rainer Stiefelhagen
353
4
0
29 Mar 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2025
Aniket Didolkar
Antonios Tragoudaras
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCL
VLM
405
4
0
27 Mar 2025
VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction
Zizhi Chen
Minghao Han
Xukun Zhang
Shuwei Ma
Tao Liu
Xing Wei
Li Zhang
418
0
0
25 Mar 2025
VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs
Kelaiti Xiao
Liang Yang
Paerhati Tulajiang
Hongfei Lin
Hongfei Lin
MLLM
324
0
0
25 Mar 2025
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Ziming Wei
Bingqian Lin
Yunshuang Nie
Jiaqi Chen
Shikui Ma
Hang Xu
Xiaodan Liang
460
3
0
23 Mar 2025
A Language Anchor-Guided Method for Robust Noisy Domain Generalization
Zilin Dai
Lehong Wang
Fangzhou Lin
Yidong Wang
Zhigang Li
Kazunori D Yamada
Ziming Zhang
Wang Lu
839
2
0
21 Mar 2025
A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli
Pengyu Liu
Guohua Dong
D. Guo
Kun Li
Fengling Li
Xun Yang
Meng Wang
Xiaomin Ying
AI4CE
234
5
0
20 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Sara Sarto
Marcella Cornia
Rita Cucchiara
351
6
0
18 Mar 2025
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions
Yifei Dong
Fengyi Wu
Qi He
Heng Li
Heng Li
...
Yuxuan Zhou
Yuxuan Zhou
Jingdong Sun
Zhi-Qi Cheng
Alexander G. Hauptmann
LM&Ro
290
1
0
18 Mar 2025
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2025
Haoyang Li
Liang Wang
Chunbai Zhang
Jing Jiang
Yan Peng
Guodong Long
VLM
325
8
0
17 Mar 2025
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
276
4
0
17 Mar 2025
Learning Privacy from Visual Entities
Proceedings on Privacy Enhancing Technologies (PoPETs), 2025
Alessio Xompero
Andrea Cavallaro
SSL
GNN
256
2
0
16 Mar 2025
DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models
Xirui Zhou
Lianlei Shan
Xiaolin Gui
200
16
0
14 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
496
11
0
13 Mar 2025
Can LLMs Understand Time Series Anomalies?
International Conference on Learning Representations (ICLR), 2024
Zihao Zhou
Rose Yu
AI4TS
373
30
0
13 Mar 2025
Towards Understanding Graphical Perception in Large Multimodal Models
Kai Zhang
Jianwei Yang
J. Inala
Chandan Singh
Jianfeng Gao
Eric Fosler-Lussier
Chenglong Wang
295
2
0
13 Mar 2025
Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework
Zhuo Zhi
Chen Feng
Adam Daneshmend
Mine Orlu
Andreas Demosthenous
L. Yin
Da Li
Ziquan Liu
Miguel R. D. Rodrigues
LRM
248
8
0
11 Mar 2025
Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational Efficiency
IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025
Duy Phuong Nguyen
J. P. Muñoz
Tanya Roosta
Ali Jannesari
FedML
260
2
0
10 Mar 2025
Anatomy-Aware Conditional Image-Text Retrieval
Meng Zheng
Jiajin Zhang
Benjamin Planche
Zhongpai Gao
Terrence Chen
Ziyan Wu
MedIm
243
0
0
10 Mar 2025
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems
Khang H. N. Vo
D. Q. Nguyen
T. Nguyen
Tho Quan
246
5
0
09 Mar 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Computer Vision and Pattern Recognition (CVPR), 2025
Chan hur
Jeong-hun Hong
Dong-hun Lee
Dabin Kang
Semin Myeong
Sang-hyo Park
Hyeyoung Park
591
5
0
07 Mar 2025
RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems
Proceedings of the VLDB Endowment (PVLDB), 2024
Biao Ouyang
Yingying Zhang
Hanyin Cheng
Yang Shu
Chenjuan Guo
Bin Yang
Qingsong Wen
L. Fan
Christian S. Jensen
193
6
0
06 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
843
3
0
05 Mar 2025
Composed Multi-modal Retrieval: A Survey of Approaches and Applications
Kun Zhang
Jingyu Li
Zhiyu Li
Jingjing Zhang
F. Li
...
Nan Chen
Lei Zhang
Yongdong Zhang
Zhendong Mao
S.Kevin Zhou
389
1
0
03 Mar 2025
Perceptual Visual Quality Assessment: Principles, Methods, and Future Directions
Wei Zhou
Hadi Amirpour
Christian Timmerer
Guoquan Zheng
P. Callet
Alan C. Bovik
245
6
0
01 Mar 2025
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Yanan Niu
Roy Sarkis
D. Psaltis
Mario Paolone
Christophe Moser
Luisa Lambertini
239
3
0
28 Feb 2025
RTGen: Real-Time Generative Detection Transformer
Chi Ruan
Jiying Zhao
Wenhu Chen
ObjD
VLM
372
0
0
28 Feb 2025
Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems
International Conference on Big Data and Smart Computing (BigComp), 2025
Faisal Mohammad
Duksan Ryu
216
0
0
28 Feb 2025
Deciphering the complaint aspects: Towards an aspect-based complaint identification model with video complaint dataset in finance
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Sarmistha Das
Basha Mujavarsheik
R E Zera Lyngkhoi
Sriparna Saha
Alka Maurya
133
0
0
26 Feb 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
443
2
0
25 Feb 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLM
LM&MA
367
5
0
24 Feb 2025
Are Large Language Models Good Data Preprocessors?
The Web Conference (WWW), 2025
Elyas Meguellati
Nardiena A. Pratama
S. Sadiq
Gianluca Demartini
269
2
0
24 Feb 2025
Beyond Pattern Recognition: Probing Mental Representations of LMs
Moritz Miller
Kumar Shridhar
ReLM
LRM
244
0
0
23 Feb 2025
Modular Prompt Learning Improves Vision-Language Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Zhenhan Huang
Tejaswini Pedapati
Pin-Yu Chen
Jianxi Gao
VLM
113
2
0
21 Feb 2025
Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank Adaptation
International Conference on Multimedia Retrieval (ICMR), 2024
Yuheng Ji
Yue Liu
Zhicheng Zhang
Zhao Zhang
Yuting Zhao
Gang Zhou
Xingwei Zhang
Xinwang Liu
Xiaolong Zheng
VLM
390
4
0
21 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
International Conference on Web and Social Media (ICWSM), 2025
Ming Shan Hee
Roy Ka-wei Lee
VLM
238
10
0
16 Feb 2025
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
295
6
0
12 Feb 2025
Vision-Language Models for Edge Networks: A Comprehensive Survey
IEEE Internet of Things Journal (IEEE IoT J.), 2025
Ahmed Sharshar
Latif U. Khan
Waseem Ullah
Mohsen Guizani
VLM
361
20
0
11 Feb 2025
Foundation Models for Anomaly Detection: Vision and Challenges
Jing Ren
Tao Tang
Hong Jia
Haytham Fayek
Haytham Fayek
Xiaodong Li
Suyu Ma
Xiwei Xu
Feng Xia
453
2
0
10 Feb 2025
A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions
Elisa Negrini
Yuxuan Liu
Liu Yang
Stanley Osher
Hayden Schaeffer
AI4CE
315
2
0
09 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
278
1
0
09 Feb 2025
Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search
Knowledge Discovery and Data Mining (KDD), 2025
Hengzhu Tang
Zefeng Zhang
Zhiping Li
Zhenyu Zhang
Xing Wu
Li Gao
Suqi Cheng
D. Yin
276
2
0
09 Feb 2025
Previous
1
2
3
4
5
...
43
44
45
Next