ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.16700
  4. Cited By
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to
  Multimodal Inputs

Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs

26 May 2024
Mustafa Shukor
Matthieu Cord
ArXivPDFHTML

Papers citing "Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs"

16 / 16 papers shown
Title
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
40
8
0
20 Mar 2024
Massive Activations in Large Language Models
Massive Activations in Large Language Models
Mingjie Sun
Xinlei Chen
J. Zico Kolter
Zhuang Liu
47
16
0
27 Feb 2024
SLEB: Streamlining LLMs through Redundancy Verification and Elimination
  of Transformer Blocks
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song
Kyungseok Oh
Taesu Kim
Hyungjun Kim
Yulhwa Kim
Jae-Joon Kim
41
5
0
14 Feb 2024
Small Language Model Meets with Reinforced Vision Vocabulary
Small Language Model Meets with Reinforced Vision Vocabulary
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
En Yu
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
VLM
44
14
0
23 Jan 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
49
48
0
04 Jan 2024
Mitigating Object Hallucinations in Large Vision-Language Models through
  Visual Contrastive Decoding
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng
Hang Zhang
Guanzheng Chen
Xin Li
Shijian Lu
Chunyan Miao
Li Bing
VLM
MLLM
65
66
0
28 Nov 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
148
280
0
14 Oct 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
235
1,899
0
30 Jan 2023
Linearly Mapping from Image to Text Space
Linearly Mapping from Image to Text Space
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
148
83
0
30 Sep 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
375
2,713
0
28 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
5,353
0
11 Nov 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models
J. Tan
C. Chan
Joon Huang Chuah
VLM
35
14
0
07 Oct 2021
Let there be a clock on the beach: Reducing Object Hallucination in
  Image Captioning
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning
Ali Furkan Biten
L. G. I. Bigorda
Dimosthenis Karatzas
68
40
0
04 Oct 2021
How Does Fine-tuning Affect the Geometry of Embedding Space: A Case
  Study on Isotropy
How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy
S. Rajaee
Mohammad Taher Pilehvar
44
18
0
10 Sep 2021
Playing Lottery Tickets with Vision and Language
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
86
49
0
23 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
267
1,486
0
09 Feb 2021
1