Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.14496
Cited By
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification
20 June 2024
Gregor Geigle
Radu Timofte
Goran Glavas
Re-assign community
ArXiv
PDF
HTML
Papers citing
"African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification"
8 / 8 papers shown
Title
Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization
Huiyi Chen
Jiawei Peng
Kaihua Tang
Xin Geng
Xu Yang
22
0
0
19 Apr 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
60
0
0
17 Feb 2025
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst
Tim Nelson Tobiasch
Lukas Helff
Inga Ibs
Wolfgang Stammer
D. Dhami
Constantin Rothkopf
Kristian Kersting
CoGe
ReLM
VLM
LRM
56
1
0
25 Oct 2024
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Muhammad Jehanzeb Mirza
Mengjie Zhao
Zhuoyuan Mao
Sivan Doveh
Wei Lin
...
Yuki Mitsufuji
Horst Possegger
Rogerio Feris
Leonid Karlinsky
James Glass
VLM
74
1
0
08 Oct 2024
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
Siddharth Karamcheti
Suraj Nair
Ashwin Balakrishna
Percy Liang
Thomas Kollar
Dorsa Sadigh
MLLM
VLM
57
95
0
12 Feb 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
76
242
0
29 Jan 2024
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
385
4,010
0
28 Jan 2022
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
E. Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
92
167
0
28 Sep 2021
1