ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.13884
  4. Cited By
Multimodal Few-Shot Learning with Frozen Language Models

Multimodal Few-Shot Learning with Frozen Language Models

25 June 2021
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
    MLLM
ArXivPDFHTML

Papers citing "Multimodal Few-Shot Learning with Frozen Language Models"

50 / 532 papers shown
Title
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal
  Models
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models
Bin Fu
Qiyang Wan
Jialin Li
Ruiping Wang
Xilin Chen
40
0
0
03 Sep 2024
Think Twice Before Recognizing: Large Multimodal Models for General
  Fine-grained Traffic Sign Recognition
Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition
Yaozong Gan
Guang Li
Ren Togo
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
39
0
0
03 Sep 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
25
0
0
28 Aug 2024
Building and better understanding vision-language models: insights and
  future directions
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
34
60
0
22 Aug 2024
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang
Jiequan Cui
Miaoge Li
Wang Lin
Bo Chen
Hanwang Zhang
MLLM
34
3
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language
  Modeling
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas J. Guibas
P. Milanfar
Feng Yang
43
2
0
07 Aug 2024
Targeted Visual Prompting for Medical Visual Question Answering
Targeted Visual Prompting for Medical Visual Question Answering
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
26
2
0
06 Aug 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton
  Modules for Compositional Visual Reasoning
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Y. Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
32
3
0
05 Aug 2024
NOLO: Navigate Only Look Once
NOLO: Navigate Only Look Once
Mengyu Bu
Shuhao Gu
Yang Feng
EgoV
36
1
0
02 Aug 2024
Autogenic Language Embedding for Coherent Point Tracking
Autogenic Language Embedding for Coherent Point Tracking
Zikai Song
Ying Tang
Run Luo
Lintao Ma
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang
39
4
0
30 Jul 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with
  Extensive Diversity
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Yangzhou Liu
Yue Cao
Zhangwei Gao
Weiyun Wang
Zhe Chen
...
Lewei Lu
Xizhou Zhu
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
42
23
0
22 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
34
1
0
22 Jul 2024
Text-Augmented Multimodal LLMs for Chemical Reaction Condition
  Recommendation
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
Yu Zhang
Ruijie Yu
Kaipeng Zeng
Ding Li
Feng Zhu
Xiaokang Yang
Yaohui Jin
Yanyan Xu
33
2
0
21 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul M. Chilimbi
Mubarak Shah
54
2
0
18 Jul 2024
Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
Soyeong Kwon
Taegyeong Lee
Taehwan Kim
DiffM
21
2
0
17 Jul 2024
Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of
  Few-Shot Learning
Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning
Mustafa Dogan
.Ilker Kesen
Iacer Calixto
Aykut Erdem
Erkut Erdem
LRM
29
1
0
17 Jul 2024
Controllable Contextualized Image Captioning: Directing the Visual
  Narrative through User-Defined Highlights
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
28
1
0
16 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
40
7
0
16 Jul 2024
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data
  via Visual Prompting
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Hyungjun Yoon
Biniyam Aschalew Tolera
Taesik Gong
Kimin Lee
Sung-Ju Lee
33
6
0
15 Jul 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for
  Resource-Limited Transfer Learning
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao
Bo Wan
Xu Jia
Yunzhi Zhuge
Ying Zhang
Huchuan Lu
Long Chen
VLM
37
4
0
10 Jul 2024
A Survey of Attacks on Large Vision-Language Models: Resources,
  Advances, and Future Trends
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
Daizong Liu
Mingyu Yang
Xiaoye Qu
Pan Zhou
Yu Cheng
Wei Hu
ELM
AAML
30
25
0
10 Jul 2024
Multi-Object Hallucination in Vision-Language Models
Multi-Object Hallucination in Vision-Language Models
Xuweiyi Chen
Ziqiao Ma
Xuejun Zhang
Sihan Xu
Shengyi Qian
Jianing Yang
David Fouhey
Joyce Chai
47
15
0
08 Jul 2024
Multimodal Prompt Learning with Missing Modalities for Sentiment
  Analysis and Emotion Recognition
Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
Zirun Guo
Tao Jin
Zhou Zhao
29
9
0
07 Jul 2024
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual
  Contexts
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
Yijia Xiao
Edward Sun
Tianyu Liu
Wei Wang
LRM
35
28
0
06 Jul 2024
Investigating the Role of Instruction Variety and Task Difficulty in
  Robotic Manipulation Tasks
Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks
Amit Parekh
Nikolas Vitsakis
Alessandro Suglia
Ioannis Konstas
AAML
33
5
0
04 Jul 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue
  Understanding with Large Language Models
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
Chang-Sheng Kao
Yun-Nung Chen
18
0
0
04 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
36
10
0
03 Jul 2024
SADL: An Effective In-Context Learning Method for Compositional Visual
  QA
SADL: An Effective In-Context Learning Method for Compositional Visual QA
Long Hoang Dang
T. Le
Vuong Le
Tu Minh Phuong
Truyen Tran
ReLM
CoGe
46
2
0
02 Jul 2024
RAVEN: Multitask Retrieval Augmented Vision-Language Learning
RAVEN: Multitask Retrieval Augmented Vision-Language Learning
Varun Nagaraj Rao
Siddharth Choudhary
Aditya Deshpande
R. Satzoda
Srikar Appalaraju
RALM
VLM
45
4
0
27 Jun 2024
LICO: Large Language Models for In-Context Molecular Optimization
LICO: Large Language Models for In-Context Molecular Optimization
Tung Nguyen
Aditya Grover
31
6
0
27 Jun 2024
Foundational Models for Pathology and Endoscopy Images: Application for
  Gastric Inflammation
Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation
H. Kerdegari
Kyle Higgins
Dennis Veselkov
I. Laponogov
I. Poļaka
...
Junior Andrea Pescino
M. Leja
M. Dinis-Ribeiro
T. F. Kanonnikoff
Kirill Veselkov
35
3
0
26 Jun 2024
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach
Yuxuan Wan
Chaozheng Wang
Yi Dong
Wenxuan Wang
Shuqing Li
Yintong Huo
M. Lyu
3DV
69
10
0
24 Jun 2024
African or European Swallow? Benchmarking Large Vision-Language Models
  for Fine-Grained Object Classification
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification
Gregor Geigle
Radu Timofte
Goran Glavas
29
10
0
20 Jun 2024
IWISDM: Assessing instruction following in multimodal models at scale
IWISDM: Assessing instruction following in multimodal models at scale
Xiaoxuan Lei
Lucas Gomez
Hao Yuan Bai
P. Bashivan
VLM
30
1
0
20 Jun 2024
Learnable In-Context Vector for Visual Question Answering
Learnable In-Context Vector for Visual Question Answering
Yingzhe Peng
Chenduo Hao
Xu Yang
Jiawei Peng
Xinting Hu
Xin Geng
37
4
0
19 Jun 2024
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Amith Ananthram
Elias Stengel-Eskin
Carl Vondrick
Mohit Bansal
VLM
32
7
0
17 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot
  Audio Task Learner
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
39
15
0
14 Jun 2024
Multimodal Large Language Models with Fusion Low Rank Adaptation for
  Device Directed Speech Detection
Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
Shruti Palaskar
Oggi Rudovic
Sameer Dharur
Florian Pesce
G. Krishna
Aswin Sivaraman
Jack Berkowitz
Ahmed Hussen Abdelaziz
Saurabh N. Adya
Ahmed H. Tewfik
VLM
55
0
0
13 Jun 2024
ReMI: A Dataset for Reasoning with Multiple Images
ReMI: A Dataset for Reasoning with Multiple Images
Mehran Kazemi
Nishanth Dikkala
Ankit Anand
Petar Dević
Ishita Dasgupta
...
Bahare Fatemi
Pranjal Awasthi
Dee Guo
Sreenivas Gollapudi
Ahmed Qureshi
LRM
VLM
34
13
0
13 Jun 2024
Advancing High Resolution Vision-Language Models in Biomedicine
Advancing High Resolution Vision-Language Models in Biomedicine
Zekai Chen
Arda Pekis
Kevin Brown
MedIm
LM&MA
16
4
0
12 Jun 2024
AIM: Let Any Multi-modal Large Language Models Embrace Efficient
  In-Context Learning
AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning
Jun Gao
Qian Qiao
Ziqiang Cao
Zili Wang
Wenjie Li
26
3
0
11 Jun 2024
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph
S. Linok
T. Zemskova
Svetlana Ladanova
Roman Titkov
Dmitry A. Yudin
Maxim Monastyrny
Aleksei Valenkov
LM&Ro
43
0
0
11 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances,
  and Future Directions
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
29
9
0
09 Jun 2024
Exploring the Zero-Shot Capabilities of Vision-Language Models for
  Improving Gaze Following
Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following
Anshul Gupta
Pierre Vuillecard
Arya Farkhondeh
J. Odobez
VLM
35
2
0
06 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
33
6
0
05 Jun 2024
Item-Language Model for Conversational Recommendation
Item-Language Model for Conversational Recommendation
Li Yang
Anushya Subbiah
Hardik Patel
Judith Yue Li
Yanwei Song
Reza Mirghaderi
Vikram Aggarwal
KELM
27
4
0
05 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
27
0
0
01 Jun 2024
Diffusion On Syntax Trees For Program Synthesis
Diffusion On Syntax Trees For Program Synthesis
Shreyas Kapur
Erik Jenner
Stuart J. Russell
DiffM
32
5
0
30 May 2024
X-VILA: Cross-Modality Alignment for Large Language Model
X-VILA: Cross-Modality Alignment for Large Language Model
Hanrong Ye
De-An Huang
Yao Lu
Zhiding Yu
Wei Ping
...
Jan Kautz
Song Han
Dan Xu
Pavlo Molchanov
Hongxu Yin
MLLM
VLM
40
29
0
29 May 2024
Multi-modal Generation via Cross-Modal In-Context Learning
Multi-modal Generation via Cross-Modal In-Context Learning
Amandeep Kumar
Muzammal Naseer
Sanath Narayan
Rao Muhammad Anwer
Salman Khan
Hisham Cholakkal
MLLM
51
0
0
28 May 2024
Previous
12345...91011
Next