Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.17247
Cited By
An Introduction to Vision-Language Modeling
27 May 2024
Florian Bordes
Richard Yuanzhe Pang
Anurag Ajay
Alexander C. Li
Adrien Bardes
Suzanne Petryk
Oscar Manas
Zhiqiu Lin
Anas Mahmoud
Bargav Jayaraman
Mark Ibrahim
Melissa Hall
Yunyang Xiong
Jonathan Lebensold
Candace Ross
Srihari Jayakumar
Chuan Guo
Diane Bouchacourt
Haider Al-Tahan
Karthik Padthe
Vasu Sharma
Huijuan Xu
Xiaoqing Ellen Tan
Megan Richards
Samuel Lavoie
Pietro Astolfi
Reyhane Askari Hemmat
Jun Chen
Kushal Tirumala
Rim Assouel
Mazda Moayeri
Arjang Talattof
Kamalika Chaudhuri
Zechun Liu
Xilun Chen
Q. Garrido
Karen Ullrich
Aishwarya Agrawal
Kate Saenko
Asli Celikyilmaz
Vikas Chandra
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"An Introduction to Vision-Language Modeling"
50 / 56 papers shown
Title
GoalLadder: Incremental Goal Discovery with Vision-Language Models
Alexey Zakharov
Shimon Whiteson
12
0
0
19 Jun 2025
Using Vision Language Models to Detect Students' Academic Emotion through Facial Expressions
Deliang Wang
Chao Yang
Gaowei Chen
VLM
117
0
0
12 Jun 2025
VIBE: Can a VLM Read the Room?
Tania Chakraborty
Eylon Caplan
Dan Goldwasser
VLM
18
0
0
11 Jun 2025
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments
Florian Bordes
Q. Garrido
Justine T Kao
Adina Williams
Michael G. Rabbat
Emmanuel Dupoux
PINN
83
0
0
11 Jun 2025
Beyond Invisibility: Learning Robust Visible Watermarks for Stronger Copyright Protection
Tianci Liu
Tong Yang
Quan Zhang
Qi Lei
WIGM
AAML
44
0
0
03 Jun 2025
Can Vision Transformers with ResNet's Global Features Fairly Authenticate Demographic Faces?
Abu Sufian
Marco Leo
Cosimo Distante
Anirudha Ghosh
Debaditya Barman
ViT
20
0
0
03 Jun 2025
Learning Sparsity for Effective and Efficient Music Performance Question Answering
Xingjian Diao
Tianzhen Yang
Chunhui Zhang
Weiyi Wu
Ming Cheng
Jiang Gui
60
1
0
02 Jun 2025
Circuit Stability Characterizes Language Model Generalization
Alan Sun
LRM
17
0
0
30 May 2025
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
Hyunsik Chae
Seungwoo Yoon
J. Park
Chloe Yewon Chun
Yongin Cho
Mu Cai
Yong Jae Lee
Ernest K. Ryu
CoGe
VLM
44
3
0
26 May 2025
Domain Adaptation of VLM for Soccer Video Understanding
Tiancheng Jiang
Henry Wang
Md Sirajus Salekin
Parmida Atighehchian
Shinan Zhang
VLM
85
0
0
20 May 2025
Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition
Muzammil Behzad
VLM
76
0
0
14 May 2025
VLM-KG: Multimodal Radiology Knowledge Graph Generation
Abdullah Abdullah
Seong Tae Kim
87
0
0
13 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I. Roumeliotis
Manoj Karkee
LM&Ro
402
2
0
07 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
301
1
0
05 May 2025
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding
Songtao Jiang
Yuan Wang
Sibo Song
Yanzhe Zhang
Zijie Meng
Bohan Lei
Jian Wu
Jimeng Sun
Zuozhu Liu
MedIm
VLM
95
3
0
20 Apr 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
Roger Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
389
2
0
15 Apr 2025
TerraMind: Large-Scale Generative Multimodality for Earth Observation
Johannes Jakubik
Felix Yang
Benedikt Blumenstiel
Erik Scheurer
Rocco Sedona
...
P. Fraccaro
Thomas Brunschwiler
Gabriele Cavallaro
Juan Bernabé-Moreno
Nicolas Longépé
MLLM
VLM
115
6
0
15 Apr 2025
Using Vision Language Models for Safety Hazard Identification in Construction
Muhammad Adil
Gaang Lee
Vicente A. Gonzalez
Qipei Mei
100
1
0
12 Apr 2025
Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition
Sergio Romero-Tapiador
Ruben Tolosana
Blanca Lacruz-Pleguezuelos
L. Marcos-Zambrano
Guadalupe X.Bazán
Isabel Espinosa-Salinas
Julian Fierrez
Javier-Ortega Garcia
Enrique Carrillo-de Santa Pau
Aythami Morales
CoGe
72
0
0
09 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
122
4
0
03 Apr 2025
One Pic is All it Takes: Poisoning Visual Document Retrieval Augmented Generation with a Single Image
Ezzeldin Shereen
Dan Ristea
Burak Hasircioglu
Shae McFadden
V. Mavroudis
Chris Hicks
183
0
0
02 Apr 2025
AutoRad-Lung: A Radiomic-Guided Prompting Autoregressive Vision-Language Model for Lung Nodule Malignancy Prediction
Sadaf Khademi
Mehran Shabanpour
Reza Taleei
A. Oikonomou
Arash Mohammadi
MedIm
98
0
0
26 Mar 2025
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
...
Jinghua Yan
Y. Bai
P. Sadayappan
Helen Zhou
Bo Yuan
VLM
155
2
0
24 Mar 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
144
0
0
17 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
149
4
0
17 Mar 2025
Removing Geometric Bias in One-Class Anomaly Detection with Adaptive Feature Perturbation
Romain Hermary
Vincent Gaudillière
Abd El Rahman Shabayek
Djamila Aouada
AAML
141
1
0
07 Mar 2025
A kinetic-based regularization method for data science applications
Abhisek Ganguly
Alessandro Gabbana
Vybhav Rao
Sauro Succi
Santosh Ansumali
134
0
0
06 Mar 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
130
1
0
25 Feb 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLM
LM&MA
121
2
0
24 Feb 2025
Object-centric Binding in Contrastive Language-Image Pretraining
Rim Assouel
Pietro Astolfi
Florian Bordes
M. Drozdzal
Adriana Romero Soriano
OCL
VLM
CoGe
161
3
0
19 Feb 2025
Vision-Driven Prompt Optimization for Large Language Models in Multimodal Generative Tasks
Leo Franklin
Apiradee Boonmee
Kritsada Wongsuwan
MLLM
VLM
104
0
0
05 Jan 2025
Geo-LLaVA: A Large Multi-Modal Model for Solving Geometry Math Problems with Meta In-Context Learning
Shihao Xu
Yiyang Luo
Wei Shi
LRM
ReLM
123
3
0
12 Dec 2024
Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies
R. Çekinel
Pinar Karagoz
Cagri Coltekin
85
4
0
06 Dec 2024
DesignMinds: Enhancing Video-Based Design Ideation with Vision-Language Model and Context-Injected Large Language Model
Tianhao He
Andrija Stankovic
E. Niforatos
Gerd Kortuem
MLLM
VGen
VLM
78
0
0
06 Nov 2024
Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction
Muhammad Tayyab Khan
Lequn Chen
Ye Han Ng
Wenhe Feng
Nicholas Yew Jin Tan
Seung Ki Moon
60
2
0
06 Nov 2024
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
Edward Vendrow
Omiros Pantazis
Alexander Shepard
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
Sara Beery
Grant Van Horn
VLM
105
7
0
04 Nov 2024
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst
Tim Nelson Tobiasch
Lukas Helff
Inga Ibs
Wolfgang Stammer
Devendra Singh Dhami
Constantin Rothkopf
Kristian Kersting
CoGe
ReLM
VLM
LRM
170
3
0
25 Oct 2024
Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models
Olga Loginova
Oleksandr Bezrukov
Ravi Shekhar
Alexey Kravets
54
2
0
18 Oct 2024
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
Lawrence Yunliang Chen
Hexiang Hu
Ruotong Wang
Yiran Chen
Zifeng Wang
...
Pranav Shyam
Tianyi Zhou
Heng-Chiao Huang
Ming-Hsuan Yang
Boqing Gong
38
3
0
16 Oct 2024
Enabling Data-Driven and Empathetic Interactions: A Context-Aware 3D Virtual Agent in Mixed Reality for Enhanced Financial Customer Experience
Cindy Xu
Mengyu Chen
Pranav Deshpande
Elvir Azanli
Runqing Yang
Joseph Ligman
34
1
0
15 Oct 2024
DPL: Cross-quality DeepFake Detection via Dual Progressive Learning
Dongliang Zhang
Yunfei Li
Jiaran Zhou
Yuezun Li
78
1
0
10 Oct 2024
Enabling Novel Mission Operations and Interactions with ROSA: The Robot Operating System Agent
Rob Royce
Marcel Kaufmann
Jonathan Becktor
Sangwoo Moon
Kalind Carpenter
Kai Pak
Amanda Towler
Rohan Thakker
Shehryar Khattak
LM&Ro
100
3
0
09 Oct 2024
AI-Driven Early Mental Health Screening: Analyzing Selfies of Pregnant Women
Gustavo A. Basílio
Thiago B. Pereira
Alessandro Lameiras Koerich
Ludmila Dias
Maria das Graças da S. Teixeira
...
Amanda S. Mota
Anilton S. Garcia
Marco Aurélio K. Galletta
Hermano Tavares
T. M. Paixão
55
0
0
07 Oct 2024
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
Rabin Adhikari
Safal Thapaliya
Manish Dhakal
Bishesh Khanal
MLLM
VLM
77
0
0
07 Oct 2024
LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies
Ameer Hamza
Abdullah Abdullah
Yong Hyun Ahn
Sungyoung Lee
Seong Tae Kim
78
4
0
07 Oct 2024
Have Large Vision-Language Models Mastered Art History?
Ombretta Strafforello
Derya Soydaner
Michiel Willems
Anne-Sofie Maerten
Stefanie De Winter
CoGe
VLM
MLLM
58
1
0
05 Sep 2024
Seeing Through Their Eyes: Evaluating Visual Perspective Taking in Vision Language Models
Gracjan Góral
Alicja Ziarko
Michal Nauman
Maciej Wołczyk
LRM
85
2
0
02 Sep 2024
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Peiming Guo
Sinuo Liu
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
Hao Fei
DiffM
151
1
0
16 Aug 2024
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
Haider Al-Tahan
Q. Garrido
Randall Balestriero
Diane Bouchacourt
C. Hazirbas
Mark Ibrahim
VLM
135
10
0
09 Aug 2024
Can ChatGPT assist visually impaired people with micro-navigation?
Junxian He
Shrinivas J. Pundlik
Gang Luo
47
0
0
31 Jul 2024
1
2
Next