ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.13884
  4. Cited By
Multimodal Few-Shot Learning with Frozen Language Models

Multimodal Few-Shot Learning with Frozen Language Models

25 June 2021
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
    MLLM
ArXivPDFHTML

Papers citing "Multimodal Few-Shot Learning with Frozen Language Models"

50 / 532 papers shown
Title
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang
Ziqiao Ma
Xiaofeng Gao
Suhaila Shakiah
Qiaozi Gao
Joyce Chai
MLLM
VLM
35
39
0
26 Feb 2024
PREDILECT: Preferences Delineated with Zero-Shot Language-based
  Reasoning in Reinforcement Learning
PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning
Simon Holk
Daniel Marta
Iolanda Leite
42
11
0
23 Feb 2024
PQA: Zero-shot Protein Question Answering for Free-form Scientific
  Enquiry with Large Language Models
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models
Eli M. Carrami
Sahand Sharifzadeh
27
2
0
21 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current
  Methodologies and Future Directions
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
49
25
0
20 Feb 2024
SoMeLVLM: A Large Vision Language Model for Social Media Processing
SoMeLVLM: A Large Vision Language Model for Social Media Processing
Xinnong Zhang
Haoyu Kuang
Xinyi Mou
Hanjia Lyu
Kun Wu
Siming Chen
Jiebo Luo
Xuanjing Huang
Zhongyu Wei
MLLM
36
5
0
20 Feb 2024
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared
  Semantic Spaces
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces
Tianyu Zheng
Ge Zhang
Xingwei Qu
Ming Kuang
Stephen W. Huang
Zhaofeng He
OffRL
45
1
0
20 Feb 2024
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM
  Context Fusion
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
Ziyue Wang
Chi Chen
Yiqi Zhu
Fuwen Luo
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Maosong Sun
Yang Janet Liu
36
5
0
19 Feb 2024
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference
  Dataset and Modular Fine-tuning Schema
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
Junru Lu
Siyu An
Min Zhang
Yulan He
Di Yin
Xing Sun
34
2
0
19 Feb 2024
Can Large Multimodal Models Uncover Deep Semantics Behind Images?
Can Large Multimodal Models Uncover Deep Semantics Behind Images?
Yixin Yang
Zheng Li
Qingxiu Dong
Heming Xia
Zhifang Sui
VLM
22
8
0
17 Feb 2024
TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks
TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks
Ben Feuer
R. Schirrmeister
Valeriia Cherepanova
Chinmay Hegde
Frank Hutter
Micah Goldblum
Niv Cohen
Colin White
40
13
0
17 Feb 2024
Large Language Models for Captioning and Retrieving Remote Sensing
  Images
Large Language Models for Captioning and Retrieving Remote Sensing Images
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
41
29
0
09 Feb 2024
Prompt Learning on Temporal Interaction Graphs
Prompt Learning on Temporal Interaction Graphs
Xi Chen
Siwei Zhang
Yun Xiong
Xixi Wu
Jiawei Zhang
Xiangguo Sun
Yao Zhang
Feng Zhao
Yulin Kang
AI4CE
34
9
0
09 Feb 2024
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based
  Recommendation
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation
Xiaohan Yu
Li Zhang
Xin Zhao
Yue Wang
Zhongrui Ma
39
9
0
07 Feb 2024
Knowledge Generation for Zero-shot Knowledge-based VQA
Knowledge Generation for Zero-shot Knowledge-based VQA
Rui Cao
Jing Jiang
19
2
0
04 Feb 2024
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and
  Dialogue Abilities
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong
Arushi Goel
Rohan Badlani
Wei Ping
Rafael Valle
Bryan Catanzaro
AuLLM
LM&MA
MLLM
59
73
0
02 Feb 2024
Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting
  the Variation in Human Signals during Visuo-Linguistic Processes
Describing Images Fast and Slow\textit{Fast and Slow}Fast and Slow: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes
Ece Takmaz
Sandro Pezzelle
Raquel Fernández
19
1
0
02 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
28
9
0
02 Feb 2024
Muffin or Chihuahua? Challenging Multimodal Large Language Models with
  Multipanel VQA
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA
Yue Fan
Jing Gu
KAI-QING Zhou
Qianqi Yan
Shan Jiang
Ching-Chen Kuo
Xinze Guan
Xin Eric Wang
24
7
0
29 Jan 2024
Towards 3D Molecule-Text Interpretation in Language Models
Towards 3D Molecule-Text Interpretation in Language Models
Sihang Li
Zhiyuan Liu
Yancheng Luo
Xiang Wang
Xiangnan He
Kenji Kawaguchi
Tat-Seng Chua
Qi Tian
AI4CE
24
42
0
25 Jan 2024
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web
  Tasks
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks
Jing Yu Koh
Robert Lo
Lawrence Jang
Vikram Duvvur
Ming Chong Lim
Po-Yu Huang
Graham Neubig
Shuyan Zhou
Ruslan Salakhutdinov
Daniel Fried
23
0
0
24 Jan 2024
Prompting Large Vision-Language Models for Compositional Reasoning
Prompting Large Vision-Language Models for Compositional Reasoning
Timothy Ossowski
Ming Jiang
Junjie Hu
CoGe
VLM
LRM
38
3
0
20 Jan 2024
Veagle: Advancements in Multimodal Representation Learning
Veagle: Advancements in Multimodal Representation Learning
Rajat Chawla
Arkajit Datta
Tushar Verma
Adarsh Jha
Anmol Gautam
Ayush Vatsal
Sukrit Chaterjee
NS Mukunda
Ishaan Bhola
VLM
6
4
0
18 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
48
35
0
16 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided
  Sequence Reordering
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
19
35
0
14 Jan 2024
Generalizing Visual Question Answering from Synthetic to Human-Written
  Questions via a Chain of QA with a Large Language Model
Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model
Taehee Kim
Yeongjae Cho
Heejun Shin
Yohan Jo
Dongmyung Shin
32
4
0
12 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models
  (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
33
66
0
10 Jan 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu
Guandao Yang
Zhibing Li
Kai Zhang
Ziwei Liu
Leonidas J. Guibas
Dahua Lin
Gordon Wetzstein
EGVM
VGen
23
88
0
08 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
16
12
0
07 Jan 2024
CaMML: Context-Aware Multimodal Learner for Large Models
CaMML: Context-Aware Multimodal Learner for Large Models
Yixin Chen
Shuai Zhang
Boran Han
Tong He
Bo Li
VLM
19
4
0
06 Jan 2024
Data-Centric Foundation Models in Computational Healthcare: A Survey
Data-Centric Foundation Models in Computational Healthcare: A Survey
Yunkun Zhang
Jin Gao
Zheling Tan
Lingfeng Zhou
Kexin Ding
Mu Zhou
Shaoting Zhang
Dequan Wang
AI4CE
21
22
0
04 Jan 2024
A Vision Check-up for Language Models
A Vision Check-up for Language Models
Pratyusha Sharma
Tamar Rott Shaham
Manel Baradad
Stephanie Fu
Adrian Rodriguez-Munoz
Shivam Duggal
Phillip Isola
Antonio Torralba
VLM
LRM
78
24
0
03 Jan 2024
Freeze the backbones: A Parameter-Efficient Contrastive Approach to
  Robust Medical Vision-Language Pre-training
Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training
Jiuming Qin
Che Liu
Sibo Cheng
Yike Guo
Rossella Arcucci
VLM
MedIm
18
5
0
02 Jan 2024
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved
  Pre-Training
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Alex Jinpeng Wang
Linjie Li
K. Lin
Jianfeng Wang
Kevin Lin
Zhengyuan Yang
Lijuan Wang
Mike Zheng Shou
VLM
VGen
19
12
0
01 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
32
144
0
28 Dec 2023
Towards Robust Multimodal Prompting With Missing Modalities
Towards Robust Multimodal Prompting With Missing Modalities
Jaehyuk Jang
Yooseung Wang
Changick Kim
VLM
30
10
0
26 Dec 2023
MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning
MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning
Liang Peng
Songyue Cai
Zongqian Wu
Huifang Shang
Xiaofeng Zhu
Xiaoxiao Li
28
9
0
22 Dec 2023
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain
Jianwei Yang
Humphrey Shi
MLLM
16
24
0
21 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
38
29
0
19 Dec 2023
A Survey of Reasoning with Foundation Models
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
E. Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
22
76
0
17 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
43
3
0
15 Dec 2023
Lever LM: Configuring In-Context Sequence to Lever Large Vision Language
  Models
Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
Xu Yang
Yingzhe Peng
Haoxuan Ma
Shuo Xu
Chi Zhang
Yucheng Han
Hanwang Zhang
30
5
0
15 Dec 2023
Depicting Beyond Scores: Advancing Image Quality Assessment through
  Multi-modal Language Models
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You
Zheyuan Li
Jinjin Gu
Zhenfei Yin
Tianfan Xue
Chao Dong
EGVM
16
35
0
14 Dec 2023
Exploration of visual prompt in Grounded pre-trained open-set detection
Exploration of visual prompt in Grounded pre-trained open-set detection
Qibo Chen
Weizhong Jin
Shuchang Li
Mengdi Liu
Li Yu
Jian Jiang
Xiaozheng Wang
VLM
13
0
0
14 Dec 2023
On a Foundation Model for Operating Systems
On a Foundation Model for Operating Systems
Divyanshu Saxena
Nihal Sharma
Donghyun Kim
Rohit Dwivedula
Jiayi Chen
...
Alex Dimakis
P. B. Godfrey
Daehyeok Kim
Chris Rossbach
Gang Wang
45
2
0
13 Dec 2023
Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis
  Framework with Prompt-Generated Rationales
Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales
Taeyoon Kwon
Kai Tzu-iunn Ong
Dongjin Kang
Seungjun Moon
J. Lee
Dosik Hwang
Yongsik Sim
B. Sohn
Dongha Lee
Jinyoung Yeo
LRM
LM&MA
26
29
0
12 Dec 2023
Large Foundation Models for Power Systems
Large Foundation Models for Power Systems
Chenghao Huang
Siyang Li
Ruohong Liu
Hao Wang
Yize Chen
AI4CE
16
23
0
12 Dec 2023
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma
Xiaojie Jin
Heng Wang
Yuchen Xian
Jiashi Feng
Yi Yang
21
47
0
12 Dec 2023
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting
  Multi-modal Language Models
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao
Zhuowan Li
Yadong Lu
Alan L. Yuille
Yan Wang
LRM
60
5
0
09 Dec 2023
FoMo Rewards: Can we cast foundation models as reward functions?
FoMo Rewards: Can we cast foundation models as reward functions?
Ekdeep Singh Lubana
Johann Brehmer
P. D. Haan
Taco S. Cohen
OffRL
LRM
38
2
0
06 Dec 2023
Context Diffusion: In-Context Aware Image Generation
Context Diffusion: In-Context Aware Image Generation
Ivona Najdenkoska
Animesh Sinha
Abhimanyu Dubey
Dhruv Mahajan
Vignesh Ramanathan
Filip Radenovic
DiffM
11
10
0
06 Dec 2023
Previous
12345...91011
Next