Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.16033
Cited By
Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs
24 October 2023
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs"
5 / 5 papers shown
Title
Scaling 3D Reasoning with LMMs to Large Robot Mission Environments Using Datagraphs
W. J. Meijer
A. C. Kemmeren
E.H.J. Riemens
J. E. Fransman
M. V. Bekkum
G. J. Burghouts
J. D. V. Mil
31
0
0
15 Jul 2024
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
49
122
0
21 Dec 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,110
0
28 Jan 2022
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
525
0
04 Feb 2021
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry
Diego Marcos
J. Murray
D. Tuia
62
203
0
16 Mar 2020
1