Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.14198
Cited By
Flamingo: a Visual Language Model for Few-Shot Learning
29 April 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
A. Mensch
Katie Millican
Malcolm Reynolds
Roman Ring
Eliza Rutherford
Serkan Cabi
Tengda Han
Zhitao Gong
Sina Samangooei
Marianne Monteiro
Jacob Menick
Sebastian Borgeaud
Andy Brock
Aida Nematzadeh
Sahand Sharifzadeh
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Flamingo: a Visual Language Model for Few-Shot Learning"
50 / 474 papers shown
Title
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
23
0
0
13 May 2025
Enhancing Target-unspecific Tasks through a Features Matrix
Fangming Cui
Yonggang Zhang
Xuan Wang
Xinmei Tian
Jun Yu
AAML
33
0
0
06 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
57
0
0
05 May 2025
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection
SungHeon Jeong
Jihong Park
Mohsen Imani
43
0
0
05 May 2025
HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction
Muhammad Haris Khan
Miguel Altamirano Cabrera
Dmitrii Iarchuk
Yara Mahmoud
Daria Trinitatova
Issatay Tokmurziyev
Dzmitry Tsetserukou
VLM
34
0
0
05 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities
Madhukar Reddy Vongala
Saurabh Srivastava
Jana Kosecka
CLIP
CoGe
VLM
34
0
0
04 May 2025
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Yuchen Wang
X. Bai
X. Li
Weili Guan
Liqiang Nie
Xinyang Chen
VLM
37
0
0
04 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
52
0
0
03 May 2025
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Chaomeng Chen
Zitong Yu
J. Dong
Sen Su
L. Shen
Shutao Xia
Xiaochun Cao
FedML
VLM
67
0
0
03 May 2025
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs
Dongxing Yu
21
0
0
03 May 2025
Scalability Matters: Overcoming Challenges in InstructGLM with Similarity-Degree-Based Sampling
Hyun Lee
Chris Yi
Maminur Islam
B.D.S. Aritra
22
0
0
02 May 2025
AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care
Md Asaduzzaman Jabin
Hanqi Jiang
Y. Li
Patrick Kaggwa
Eugene Douglass
Juliet N. Sekandi
Tianming Liu
LM&MA
66
0
0
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
57
1
0
01 May 2025
Multi-Modal Language Models as Text-to-Image Model Evaluators
Jiahui Chen
Candace Ross
Reyhane Askari Hemmat
Koustuv Sinha
Melissa Hall
M. Drozdzal
Adriana Romero-Soriano
EGVM
60
0
0
01 May 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
79
0
0
30 Apr 2025
Rethinking Visual Layer Selection in Multimodal LLMs
H. Chen
Junyan Lin
Xinhao Chen
Yue Fan
Xin Jin
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
93
0
0
30 Apr 2025
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Jiaxu Qian
Chendong Wang
Y. Yang
Chaoyun Zhang
Huiqiang Jiang
...
Saravan Rajmohan
Dongmei Zhang
Y. Yang
Qi Zhang
Lili Qiu
VLM
70
0
0
30 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
71
0
0
29 Apr 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
76
0
0
29 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
L. Varshney
54
0
0
27 Apr 2025
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Yi-Xing Peng
Q. Yang
Yu-Ming Tang
Shenghao Fu
Kun-Yu Lin
Xihan Wei
Wei-Shi Zheng
40
0
0
25 Apr 2025
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
Jiahao Zhang
Bowen Wang
Hong Liu
Liangzhi Li
Yuta Nakashima
Hajime Nagahara
VLM
99
0
0
25 Apr 2025
Multimodal graph representation learning for website generation based on visual sketch
Tung D. Vu
Chung Hoang
Truong-Son Hy
3DV
48
0
0
25 Apr 2025
A Large Vision-Language Model based Environment Perception System for Visually Impaired People
Zezhou Chen
Zhaoxiang Liu
Kai Wang
Kohou Wang
Shiguo Lian
47
0
0
25 Apr 2025
AI Awareness
X. Li
Haoyuan Shi
Rongwu Xu
Wei Xu
54
0
0
25 Apr 2025
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Jiachen Li
Qing Xie
Xiaohan Yu
Hongyun Wang
Jinyu Xu
Yongjian Liu
ObjD
74
0
0
20 Apr 2025
FLIP Reasoning Challenge
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
VLM
LRM
72
0
0
16 Apr 2025
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng
Songyou Peng
Kyle Genova
Gordon Wetzstein
Noah Snavely
Leonidas J. Guibas
Thomas Funkhouser
HAI
50
0
0
11 Apr 2025
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Team Seawead
Ceyuan Yang
Zhijie Lin
Yang Zhao
Shanchuan Lin
...
Zuquan Song
Zhenheng Yang
Jiashi Feng
Jianchao Yang
Lu Jiang
DiffM
77
1
0
11 Apr 2025
Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?
Roman Kochnev
Arash Torabi Goodarzi
Zofia Antonina Bentyn
D. Ignatov
Radu Timofte
48
2
0
08 Apr 2025
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision
Yuandong Pu
Le Zhuo
Kaiwen Zhu
Liangbin Xie
Wenlong Zhang
Xiangyu Chen
Peng Gao
Yu Qiao
Chao Dong
Yihao Liu
MLLM
59
1
0
07 Apr 2025
Don't Lag, RAG: Training-Free Adversarial Detection Using RAG
Roie Kazoom
Raz Lapid
Moshe Sipper
Ofer Hadar
VLM
ObjD
AAML
52
0
0
07 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
64
0
0
03 Apr 2025
Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
Chuning Zhu
Raymond Yu
S. Feng
Benjamin Burchfiel
Paarth Shah
Abhishek Gupta
VGen
55
0
0
03 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
32
0
0
02 Apr 2025
Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
Zhaochen Wang
Yujun Cai
Zi Huang
Bryan Hooi
Yiwei Wang
Ming Yang
CoGe
VLM
71
0
0
02 Apr 2025
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training
Yijie Zheng
Bangjun Xiao
Lei Shi
Xiaoyang Li
Faming Wu
Tianyu Li
Xuefeng Xiao
Y. Zhang
Y. Wang
Shouda Liu
MLLM
MoE
64
1
0
31 Mar 2025
Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning
Ashim Dahal
Saydul Akbar Murad
Nick Rahimi
VLM
40
0
0
30 Mar 2025
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang
Yurui Chen
Yanpeng Zhou
Yueming Xu
Ze Huang
...
Xinyue Cai
G. Huang
Xingyue Quan
Hang Xu
Li Zhang
LRM
87
0
0
29 Mar 2025
Cognitive Science-Inspired Evaluation of Core Capabilities for Object Understanding in AI
Danaja Rutar
Alva Markelius
Konstantinos Voudouris
José Hernández Orallo
Lucy G. Cheke
OCL
ELM
56
0
0
27 Mar 2025
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Jun Zhou
J. Li
Zunnan Xu
Hanhui Li
Yiji Cheng
Fa-Ting Hong
Qin Lin
Qinglin Lu
Xiaodan Liang
DiffM
62
1
0
25 Mar 2025
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
Vignesh Prabhakar
Md Amirul Islam
Adam Atanas
Y. Wang
J. N. Han
...
Rucha Apte
Robert Clark
Kang Xu
Zihan Wang
Kai Liu
LRM
77
1
0
22 Mar 2025
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
50
1
0
21 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
76
0
0
20 Mar 2025
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
Zihao Zhang
Haoran Chen
Haoyu Zhao
Guansong Lu
Yanwei Fu
Hang Xu
Zuxuan Wu
VGen
DiffM
62
0
0
20 Mar 2025
A Vision Centric Remote Sensing Benchmark
Abduljaleel Adejumo
Faegheh Yeganli
Clifford Broni-Bediako
Aoran Xiao
Naoto Yokoya
Mennatullah Siam
55
0
0
20 Mar 2025
Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering
Thanh-Son Nguyen
Hong Yang
Tzeh Yuan Neoh
Hao Zhang
Ee Yeo Keat
Basura Fernando
NAI
54
0
0
19 Mar 2025
ChatBEV: A Visual Language Model that Understands BEV Maps
Qingyao Xu
S. Chen
Guang Chen
Yanfeng Wang
Y. Zhang
42
0
0
18 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Hannah Brandon
Prithvijit Chattopadhyay
Huayu Chen
...
Yao Xu
X. Yang
Zhuolin Yang
Xiaohui Zeng
Z. Zhang
LM&Ro
LRM
AI4CE
52
5
0
18 Mar 2025
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills
Haoqi Yuan
Yu Bai
Yuhui Fu
Bohan Zhou
Yicheng Feng
Xinrun Xu
Yi Zhan
Börje F. Karlsson
Zongqing Lu
LM&Ro
74
0
0
16 Mar 2025
1
2
3
4
...
8
9
10
Next