Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.13884
Cited By
Multimodal Few-Shot Learning with Frozen Language Models
25 June 2021
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Few-Shot Learning with Frozen Language Models"
50 / 532 papers shown
Title
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
67
41
0
23 May 2024
ProtT3: Protein-to-Text Generation for Text-based Protein Understanding
Zhiyuan Liu
An Zhang
Hao Fei
Enzhi Zhang
Xiang Wang
Kenji Kawaguchi
Tat-Seng Chua
52
16
0
21 May 2024
TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models
Junlong Jia
Ying Hu
Xi Weng
Yiming Shi
Miao Li
...
Baichuan Zhou
Ziyu Liu
Jie Luo
Lei Huang
Ji Wu
30
9
0
20 May 2024
Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
Shibo Jie
Yehui Tang
Ning Ding
Zhi-Hong Deng
Kai Han
Yunhe Wang
VLM
33
6
0
09 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao-Yu Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
49
15
0
08 May 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
82
3
0
29 Apr 2024
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting
Yuanyuan Liu
Yuxuan Huang
Shuyang Liu
Yibing Zhan
Zijing Chen
Zhe Chen
VLM
40
1
0
26 Apr 2024
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
27
5
0
24 Apr 2024
What Makes Multimodal In-Context Learning Work?
Folco Bertini Baldassini
Mustafa Shukor
Matthieu Cord
Laure Soulier
Benjamin Piwowarski
32
18
0
24 Apr 2024
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Davide Caffagni
Federico Cocchi
Nicholas Moratelli
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
KELM
29
35
0
23 Apr 2024
Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities
Siyin Wang
Chao-Han Huck Yang
Ji Wu
Chao Zhang
BDL
32
4
0
23 Apr 2024
The Solution for the CVPR2024 NICE Image Captioning Challenge
Longfei Huang
Shupeng Zhong
Xiangyu Wu
Ruoxuan Li
19
0
0
19 Apr 2024
Private Attribute Inference from Images with Vision-Language Models
Batuhan Tömekçe
Mark Vero
Robin Staab
Martin Vechev
VLM
PILM
55
6
0
16 Apr 2024
Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier
Utkarsh Mall
Carl Vondrick
VLM
28
10
0
15 Apr 2024
Bridging Vision and Language Spaces with Assignment Prediction
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
VLM
29
6
0
15 Apr 2024
On Speculative Decoding for Multimodal Large Language Models
Mukul Gagrani
Raghavv Goel
Wonseok Jeon
Junyoung Park
Mingu Lee
Christopher Lott
LRM
32
7
0
13 Apr 2024
On Unified Prompt Tuning for Request Quality Assurance in Public Code Review
Xinyu Chen
Lin Li
Rui Zhang
Peng Liang
27
1
0
11 Apr 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
Uncovering the Text Embedding in Text-to-Image Diffusion Models
Huikang Yu
Hao Luo
Fan Wang
Feng Zhao
29
10
0
01 Apr 2024
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
39
6
0
01 Apr 2024
Generative Multi-modal Models are Good Class-Incremental Learners
Xusheng Cao
Haori Lu
Linlan Huang
Xialei Liu
Ming-Ming Cheng
CLL
41
10
0
27 Mar 2024
Toward Interactive Regional Understanding in Vision-Large Language Models
Jungbeom Lee
Sanghyuk Chun
Sangdoo Yun
VLM
21
1
0
27 Mar 2024
UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction
Xixuan Hao
Wei Chen
Yibo Yan
Siru Zhong
Kun Wang
Qingsong Wen
Yuxuan Liang
VLM
74
1
0
25 Mar 2024
LinkPrompt
\textit{LinkPrompt}
LinkPrompt
: Natural and Universal Adversarial Attacks on Prompt-based Language Models
Yue Xu
Wenjie Wang
SILM
AAML
26
2
0
25 Mar 2024
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li
Bhavan A. Jasani
Peng Tang
Shabnam Ghadar
LRM
27
8
0
25 Mar 2024
Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni
Hongbo Zhao
Chenghao Zhang
Ke Hu
Gaofeng Meng
Zhaoxiang Zhang
Shiming Xiang
CLL
VLM
32
3
0
24 Mar 2024
Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
Hao Wang
Tang Li
Chenhui Chu
Nengjun Zhu
Rui-cang Wang
Pinpin Zhu
21
0
0
23 Mar 2024
MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection
Taeheon Kim
Sangyun Chung
Damin Yeom
Youngjoon Yu
Hak Gu Kim
Y. Ro
38
2
0
22 Mar 2024
Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs
Yusuke Mikami
Andrew Melnik
Jun Miura
Ville Hautamaki
LM&Ro
LRM
58
4
0
20 Mar 2024
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
54
12
0
20 Mar 2024
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Zhipeng Huang
Zhizheng Zhang
Zheng-Jun Zha
Yan Lu
Baining Guo
VLM
36
3
0
19 Mar 2024
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
Yongshuo Zong
Ondrej Bohdal
Timothy M. Hospedales
28
7
0
19 Mar 2024
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
Rao Fu
Jingyu Liu
Xilun Chen
Yixin Nie
Wenhan Xiong
LM&Ro
LRM
47
48
0
18 Mar 2024
Few-Shot VQA with Frozen LLMs: A Tale of Two Approaches
Igor Sterner
Weizhe Lin
Jinghong Chen
Bill Byrne
25
2
0
17 Mar 2024
RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training
Zhixiu Lu
Hailong Li
Lili He
VLM
MedIm
27
0
0
15 Mar 2024
PosSAM: Panoptic Open-vocabulary Segment Anything
VS Vibashan
Shubhankar Borse
Hyojin Park
Debasmit Das
Vishal M. Patel
Munawar Hayat
Fatih Porikli
VLM
MLLM
36
6
0
14 Mar 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
32
186
0
14 Mar 2024
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
29
2
0
14 Mar 2024
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu
Fangyun Wei
Yanye Lu
MLLM
VLM
36
17
0
12 Mar 2024
Synth
2
^2
2
: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Sahand Sharifzadeh
Christos Kaplanis
Shreya Pathak
D. Kumaran
Anastasija Ilić
Jovana Mitrović
Charles Blundell
Andrea Banino
VLM
37
9
0
12 Mar 2024
Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller
Chuanqi Zang
Jiji Tang
Rongsheng Zhang
Zeng Zhao
Tangjie Lv
Mingtao Pei
Wei Liang
30
3
0
12 Mar 2024
SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Jielin Qiu
Andrea Madotto
Zhaojiang Lin
Paul A. Crook
Y. Xu
Xin Luna Dong
Christos Faloutsos
Lei Li
Babak Damavandi
Seungwhan Moon
31
8
0
07 Mar 2024
A Modular Approach for Multimodal Summarization of TV Shows
Louis Mahon
Mirella Lapata
21
9
0
06 Mar 2024
Task Attribute Distance for Few-Shot Learning: Theoretical Analysis and Applications
Minyang Hu
Hong Chang
Zong Guo
Bingpeng Ma
Shiguang Shan
Xilin Chen
VLM
21
1
0
06 Mar 2024
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Jiwen Zhang
Jihao Wu
Yihua Teng
Minghui Liao
Nuo Xu
Xiao Xiao
Zhongyu Wei
Duyu Tang
LLMAG
LM&Ro
32
50
0
05 Mar 2024
NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Lizhou Fan
Wenyue Hua
Xiang Li
Kaijie Zhu
Mingyu Jin
...
Haoyang Ling
Jinkui Chi
Jindong Wang
Xin Ma
Yongfeng Zhang
LRM
35
14
0
04 Mar 2024
Few-Shot Relation Extraction with Hybrid Visual Evidence
Jiaying Gong
Hoda Eldardiry
16
0
0
01 Mar 2024
VIXEN: Visual Text Comparison Network for Image Difference Captioning
Alexander Black
Jing Shi
Yifei Fai
Tu Bui
John Collomosse
42
5
0
29 Feb 2024
Grounding Language Models for Visual Entity Recognition
Zilin Xiao
Ming Gong
Paola Cascante-Bonilla
Xingyao Zhang
Jie Wu
Vicente Ordonez
VLM
38
8
0
28 Feb 2024
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
Yulong Liu
Yunlong Yuan
Chunwei Wang
Jianhua Han
Yongqiang Ma
Li Zhang
Nanning Zheng
Hang Xu
LLMAG
24
5
0
28 Feb 2024
Previous
1
2
3
4
5
6
...
9
10
11
Next