Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.16700
Cited By
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
26 May 2024
Mustafa Shukor
Matthieu Cord
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs"
16 / 16 papers shown
Title
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
52
12
0
20 Mar 2024
Massive Activations in Large Language Models
Mingjie Sun
Xinlei Chen
J. Zico Kolter
Zhuang Liu
60
64
0
27 Feb 2024
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song
Kyungseok Oh
Taesu Kim
Hyungjun Kim
Yulhwa Kim
Jae-Joon Kim
62
20
0
14 Feb 2024
Small Language Model Meets with Reinforced Vision Vocabulary
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
En Yu
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
VLM
57
14
0
23 Jan 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
63
89
0
04 Jan 2024
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng
Hang Zhang
Guanzheng Chen
Xin Li
Shijian Lu
Chunyan Miao
Li Bing
VLM
MLLM
82
196
0
28 Nov 2023
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
152
280
0
14 Oct 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Linearly Mapping from Image to Text Space
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
156
104
0
30 Sep 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models
J. Tan
C. Chan
Joon Huang Chuah
VLM
41
16
0
07 Oct 2021
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning
Ali Furkan Biten
L. G. I. Bigorda
Dimosthenis Karatzas
84
55
0
04 Oct 2021
How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy
S. Rajaee
Mohammad Taher Pilehvar
58
20
0
10 Sep 2021
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
98
53
0
23 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
275
1,939
0
09 Feb 2021
1