ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Neural Information Processing Systems (NeurIPS), 2019

6 August 2019

Devi Parikh

Papers citing "ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"

50 / 2,232 papers shown

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

Cheng-Yu Hsieh

Pavan Kumar Anasosalu Vasu

962

11 Apr 2025

TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs

206

10 Apr 2025

Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical ImagingInternational Journal of Machine Learning and Cybernetics (IJMLC), 2025

223

09 Apr 2025

Locations of Characters in Narratives: Andersen and Persuasion Datasets

Batuhan Ozyurt

Roya Arkhmammadova

Deniz Yuret

163

04 Apr 2025

Neutralizing the Narrative: AI-Powered Debiasing of Online News Articles

384

04 Apr 2025

Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionInformation Fusion (Inf. Fusion), 2025

...

445

03 Apr 2025

Group-based Distinctive Image Captioning with Memory Difference Encoding and AttentionInternational Journal of Computer Vision (IJCV), 2024

370

03 Apr 2025

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric InteractionComputer Vision and Pattern Recognition (CVPR), 2025

268

02 Apr 2025

RefChartQA: Grounding Visual Answer on Chart Images through Instruction TuningIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025

376

29 Mar 2025

CTRL-O: Language-Controllable Object-Centric Visual Representation LearningComputer Vision and Pattern Recognition (CVPR), 2025

427

27 Mar 2025

VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction

427

25 Mar 2025

VisualQuest: A Benchmark for Abstract Visual Reasoning in MLLMs

383

25 Mar 2025

Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation

489

23 Mar 2025

A Language Anchor-Guided Method for Robust Noisy Domain Generalization

855

21 Mar 2025

A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli

268

20 Mar 2025

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future PerspectivesInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

Sara Sarto

Marcella Cornia

Rita Cucchiara

368

18 Mar 2025

HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions

...

Alexander G. Hauptmann

LM&Ro

299

18 Mar 2025

DPC: Dual-Prompt Collaboration for Tuning Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

360

17 Mar 2025

Quantum EigenGame for excited state calculation

David Quiroga

Jason Han

Anastasios Kyrillidis

280

17 Mar 2025

Learning Privacy from Visual EntitiesProceedings on Privacy Enhancing Technologies (PoPETs), 2025

Alessio Xompero

Andrea Cavallaro

SSL GNN

264

16 Mar 2025

DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models

Xirui Zhou

Lianlei Shan

Xiaolin Gui

220

14 Mar 2025

FlowTok: Flowing Seamlessly Across Text and Image Tokens

530

13 Mar 2025

Can LLMs Understand Time Series Anomalies?International Conference on Learning Representations (ICLR), 2024

Zihao Zhou

Rose Yu

AI4TS

403

13 Mar 2025

Towards Understanding Graphical Perception in Large Multimodal Models

316

13 Mar 2025

Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework

Miguel R. D. Rodrigues

LRM

266

11 Mar 2025

Federated Multimodal Learning with Dual Adapters and Selective Pruning for Communication and Computational EfficiencyIEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025

288

10 Mar 2025

Anatomy-Aware Conditional Image-Text Retrieval

258

10 Mar 2025

TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems

271

09 Mar 2025

Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level CaptionsComputer Vision and Pattern Recognition (CVPR), 2025

632

07 Mar 2025

RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database SystemsProceedings of the VLDB Endowment (PVLDB), 2024

203

06 Mar 2025

Enhancing Collective Intelligence in Large Language Models Through Emotional Integration

867

05 Mar 2025

Composed Multi-modal Retrieval: A Survey of Approaches and Applications

...

405

03 Mar 2025

Perceptual Visual Quality Assessment: Principles, Methods, and Future Directions

270

01 Mar 2025

Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time SeriesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025

254

28 Feb 2025

RTGen: Real-Time Generative Detection Transformer

418

28 Feb 2025

Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving SystemsInternational Conference on Big Data and Smart Computing (BigComp), 2025

Faisal Mohammad

Duksan Ryu

227

28 Feb 2025

Deciphering the complaint aspects: Towards an aspect-based complaint identification model with video complaint dataset in financeIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025

140

26 Feb 2025

FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA

S M Sarwar

463

25 Feb 2025

Vision Language Models in Medicine

Beria Chingnabe Kalpelbe

Angel Gabriel Adaambiik

Wei Peng

VLM LM&MA

385

24 Feb 2025

Are Large Language Models Good Data Preprocessors?The Web Conference (WWW), 2025

270

24 Feb 2025

Beyond Pattern Recognition: Probing Mental Representations of LMs

Moritz Miller

Kumar Shridhar

ReLM LRM

252

23 Feb 2025

Modular Prompt Learning Improves Vision-Language ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

119

21 Feb 2025

Enhancing Adversarial Robustness of Vision-Language Models through Low-Rank AdaptationInternational Conference on Multimedia Retrieval (ICMR), 2024

402

21 Feb 2025

Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable DecisionsInternational Conference on Web and Social Media (ICWSM), 2025

Ming Shan Hee

Roy Ka-wei Lee

VLM

247

16 Feb 2025

Handwritten Text Recognition: A Survey

Carlos Garrido-Munoz

Antonio Ríos-Vila

Jorge Calvo-Zaragoza

315

12 Feb 2025

Vision-Language Models for Edge Networks: A Comprehensive SurveyIEEE Internet of Things Journal (IEEE IoT J.), 2025

379

11 Feb 2025

Foundation Models for Anomaly Detection: Vision and Challenges

466

10 Feb 2025

A Multimodal PDE Foundation Model for Prediction and Scientific Text Descriptions

344

09 Feb 2025

Performance Analysis of Traditional VQA Models Under Limited Computational Resources

Jihao Gu

286

09 Feb 2025

Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video SearchKnowledge Discovery and Data Mining (KDD), 2025

284

09 Feb 2025