ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.13884
  4. Cited By
Multimodal Few-Shot Learning with Frozen Language Models

Multimodal Few-Shot Learning with Frozen Language Models

25 June 2021
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
    MLLM
ArXivPDFHTML

Papers citing "Multimodal Few-Shot Learning with Frozen Language Models"

50 / 532 papers shown
Title
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal
  Models
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai
Zirui Song
Dayan Guan
Zhenhao Chen
Xing Luo
Chenyu Yi
Alex C. Kot
MLLM
VLM
25
31
0
05 Dec 2023
Towards More Unified In-context Visual Understanding
Towards More Unified In-context Visual Understanding
Dianmo Sheng
Dongdong Chen
Zhentao Tan
Qiankun Liu
Qi Chu
Jianmin Bao
Tao Gong
Bin Liu
Shengwei Xu
Nenghai Yu
MLLM
VLM
24
10
0
05 Dec 2023
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large
  Language Models
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models
Bingshuai Liu
Chenyang Lyu
Zijun Min
Zhanyu Wang
Jinsong Su
Longyue Wang
LRM
31
7
0
04 Dec 2023
StoryGPT-V: Large Language Models as Consistent Story Visualizers
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen
Mohamed Elhoseiny
VLM
90
10
0
04 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large
  Image-Language Models
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLM
VLM
18
9
0
03 Dec 2023
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of
  Low-rank Experts
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu
Xia Hu
Yaqing Wang
Bo Pang
Radu Soricut
MoE
19
14
0
01 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware
  representations to LLMs and Emergent Cross-modal Reasoning
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq R. Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
33
45
0
30 Nov 2023
Understanding and Improving In-Context Learning on Vision-language
  Models
Understanding and Improving In-Context Learning on Vision-language Models
Shuo Chen
Zhen Han
Bailan He
Mark Buckley
Philip H. S. Torr
Volker Tresp
Jindong Gu
25
6
0
29 Nov 2023
The curse of language biases in remote sensing VQA: the role of spatial
  attributes, language diversity, and the need for clear evaluation
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation
Christel Chappuis
Eliot Walt
Vincent Mendez
Sylvain Lobry
B. L. Saux
D. Tuia
23
3
0
28 Nov 2023
Conditional Prompt Tuning for Multimodal Fusion
Conditional Prompt Tuning for Multimodal Fusion
Ruixia Jiang
Lingbo Liu
Changwen Chen
16
0
0
28 Nov 2023
Graph Prompt Learning: A Comprehensive Survey and Beyond
Graph Prompt Learning: A Comprehensive Survey and Beyond
Xiangguo Sun
Jiawen Zhang
Xixi Wu
Hong Cheng
Yun Xiong
Jia Li
25
51
0
28 Nov 2023
Compositional Zero-shot Learning via Progressive Language-based
  Observations
Compositional Zero-shot Learning via Progressive Language-based Observations
Lin Li
Guikun Chen
Jun Xiao
Long Chen
29
7
0
23 Nov 2023
Multimodal Large Language Models: A Survey
Multimodal Large Language Models: A Survey
Jiayang Wu
Wensheng Gan
Zefeng Chen
Shicheng Wan
Philip S. Yu
22
168
0
22 Nov 2023
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction
  Data
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu
Juncheng Li
Longhui Wei
Liang Pang
Wentao Ye
Bosheng Qin
Siliang Tang
Qi Tian
Yueting Zhuang
MLLM
VLM
25
67
0
22 Nov 2023
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text
  Recognizer
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Zhen Zhao
Jingqun Tang
Chunhui Lin
Binghong Wu
Can Huang
Hao Liu
Xin Tan
Zhizhong Zhang
Yuan Xie
34
20
0
22 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
38
249
0
21 Nov 2023
An Embodied Generalist Agent in 3D World
An Embodied Generalist Agent in 3D World
Jiangyong Huang
Silong Yong
Xiaojian Ma
Xiongkun Linghu
Puhao Li
Yan Wang
Qing Li
Song-Chun Zhu
Baoxiong Jia
Siyuan Huang
LM&Ro
31
134
0
18 Nov 2023
Improving Zero-shot Visual Question Answering via Large Language Models
  with Reasoning Question Prompts
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts
Yunshi Lan
Xiang Li
Xin Liu
Yang Li
Wei Qin
Weining Qian
LRM
ReLM
25
23
0
15 Nov 2023
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Yi Yang
Qingwen Zhang
Ci Li
Daniel Simoes Marta
Nazre Batool
John Folkesson
LRM
62
29
0
14 Nov 2023
To See is to Believe: Prompting GPT-4V for Better Visual Instruction
  Tuning
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Junke Wang
Lingchen Meng
Zejia Weng
Bo He
Zuxuan Wu
Yu-Gang Jiang
MLLM
VLM
27
93
0
13 Nov 2023
Large Language Models for Robotics: A Survey
Large Language Models for Robotics: A Survey
Fanlong Zeng
Wensheng Gan
Yongheng Wang
Ning Liu
Philip S. Yu
LM&Ro
113
125
0
13 Nov 2023
InfMLLM: A Unified Framework for Visual-Language Tasks
InfMLLM: A Unified Framework for Visual-Language Tasks
Qiang-feng Zhou
Zhibin Wang
Wei Chu
Yinghui Xu
Hao Li
Yuan Qi
MLLM
16
11
0
12 Nov 2023
GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot
  Learning
GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning
Guangyue Xu
Joyce Chai
Parisa Kordjamshidi
VLM
18
16
0
09 Nov 2023
Just-in-time Quantization with Processing-In-Memory for Efficient ML
  Training
Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
M. Ibrahim
Shaizeen Aga
Ada Li
Suchita Pati
Mahzabeen Islam
21
3
0
08 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task
  Completion
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
24
7
0
07 Nov 2023
Detecting Any Human-Object Interaction Relationship: Universal HOI
  Detector with Spatial Prompt Learning on Foundation Models
Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models
Yichao Cao
Qingfei Tang
Xiu Su
Chen Song
Shan You
Xiaobo Lu
Chang Xu
20
21
0
07 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
17
445
0
06 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLM
DiffM
29
10
0
01 Nov 2023
ChatGPT-Powered Hierarchical Comparisons for Image Classification
ChatGPT-Powered Hierarchical Comparisons for Image Classification
Zhiyuan Ren
Yiyang Su
Xiaoming Liu
VLM
40
21
0
01 Nov 2023
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical
  Image Analysis
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis
Yingshu Li
Yunyi Liu
Zhanyu Wang
Xinyu Liang
Lei Wang
Lingqiao Liu
Leyang Cui
Zhaopeng Tu
Longyue Wang
Luping Zhou
ELM
LM&MA
32
36
0
31 Oct 2023
Open Visual Knowledge Extraction via Relation-Oriented Multimodality
  Model Prompting
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
Hejie Cui
Xinyu Fang
Zihan Zhang
Ran Xu
Xuan Kan
Xin Liu
Yue Yu
Manling Li
Yangqiu Song
Carl Yang
VLM
15
4
0
28 Oct 2023
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large
  Language Models
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models
Eren Unlu
Unver Ciftci
28
0
0
27 Oct 2023
Exploring Question Decomposition for Zero-Shot VQA
Exploring Question Decomposition for Zero-Shot VQA
Zaid Khan
B. Vijaykumar
S. Schulter
Manmohan Chandraker
Yun Fu
ReLM
17
10
0
25 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
21
0
0
25 Oct 2023
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
Joy Hsu
Jiayuan Mao
Joshua B. Tenenbaum
Jiajun Wu
VLM
ReLM
LRM
21
21
0
24 Oct 2023
Towards Perceiving Small Visual Details in Zero-shot Visual Question
  Answering with Multimodal LLMs
Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
31
2
0
24 Oct 2023
Large Language Models are biased to overestimate profoundness
Large Language Models are biased to overestimate profoundness
Eugenio Herrera-Berg
Tomás Vergara Browne
Pablo León-Villagrá
Marc-Lluís Vives
Cristian Buc Calderon
ELM
17
6
0
22 Oct 2023
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with
  Contrastive Language-Image Pretraining from the Web
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web
Yibo Yan
Haomin Wen
Siru Zhong
Wei Chen
Haodong Chen
Qingsong Wen
Roger Zimmermann
Yuxuan Liang
13
47
0
22 Oct 2023
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and
  Uni-Modal Adapter
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
Zhiyuan Liu
Sihang Li
Yancheng Luo
Hao Fei
Yixin Cao
Kenji Kawaguchi
Xiang Wang
Tat-Seng Chua
24
80
0
19 Oct 2023
On the use of Vision-Language models for Visual Sentiment Analysis: a
  study on CLIP
On the use of Vision-Language models for Visual Sentiment Analysis: a study on CLIP
Cristina Bustos
Carles Civit
Brian Du
Albert Solé-Ribalta
Àgata Lapedriza
VLM
14
5
0
18 Oct 2023
MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language
  Models to Generalize to Novel Interpretations
MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Arkil Patel
S. Bhattamishra
Siva Reddy
Dzmitry Bahdanau
29
5
0
18 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
31
116
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
22
1
0
15 Oct 2023
Beyond Segmentation: Road Network Generation with Multi-Modal LLMs
Beyond Segmentation: Road Network Generation with Multi-Modal LLMs
Sumedh Rasal
Sanjay K. Boddhu
27
5
0
15 Oct 2023
Mastering Robot Manipulation with Multimodal Prompts through Pretraining
  and Multi-task Fine-tuning
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Jiachen Li
Qiaozi Gao
Michael Johnston
Xiaofeng Gao
Xuehai He
Suhaila Shakiah
Hangjie Shi
R. Ghanadan
William Yang Wang
LM&Ro
19
12
0
14 Oct 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
160
440
0
14 Oct 2023
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan
Xinzhe Han
Shuhui Wang
Qingming Huang
30
0
0
12 Oct 2023
Multimodal Graph Learning for Generative Tasks
Multimodal Graph Learning for Generative Tasks
Minji Yoon
Jing Yu Koh
Bryan Hooi
Ruslan Salakhutdinov
25
7
0
11 Oct 2023
Rephrase, Augment, Reason: Visual Grounding of Questions for
  Vision-Language Models
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
ReLM
LRM
28
8
0
09 Oct 2023
Visual Storytelling with Question-Answer Plans
Visual Storytelling with Question-Answer Plans
Danyang Liu
Mirella Lapata
Frank Keller
CoGe
11
9
0
08 Oct 2023
Previous
123456...91011
Next