ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.13800
  4. Cited By
Dense Connector for MLLMs

Dense Connector for MLLMs

22 May 2024
Huanjin Yao
Wenhao Wu
Taojiannan Yang
Yuxin Song
Mengxi Zhang
Haocheng Feng
Yifan Sun
Zhiheng Li
Wanli Ouyang
Jingdong Wang
    MLLM
    VLM
ArXivPDFHTML

Papers citing "Dense Connector for MLLMs"

22 / 22 papers shown
Title
Rethinking Visual Layer Selection in Multimodal LLMs
Rethinking Visual Layer Selection in Multimodal LLMs
H. Chen
Junyan Lin
Xinhao Chen
Yue Fan
Xin Jin
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
93
0
0
30 Apr 2025
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Yi-Xing Peng
Q. Yang
Yu-Ming Tang
Shenghao Fu
Kun-Yu Lin
Xihan Wei
Wei-Shi Zheng
40
0
0
25 Apr 2025
LangBridge: Interpreting Image as a Combination of Language Embeddings
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao
Yuwei Niu
Fanqing Meng
Hao Li
Changyao Tian
...
Dianqi Li
X. Zhu
Li Yuan
Jifeng Dai
Yu Cheng
MLLM
72
0
0
25 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Y. Yang
Afshin Dehghan
Peter Grasch
72
2
0
17 Mar 2025
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
Umberto Cappellazzo
Minsu Kim
Stavros Petridis
47
0
0
09 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Xin Jin
Hui Su
Jinlan Fu
Xiaoyu Shen
60
0
0
08 Mar 2025
Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual
  Instruction Fine-Tuning
Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao
Bin Zhu
Jingjing Chen
Chong-Wah Ngo
Yu-Gang Jiang
VLM
OffRL
69
0
0
19 Nov 2024
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained
  Vision-Language Understanding
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Yue Cao
Yangzhou Liu
Zhe Chen
Guangchen Shi
Wenhai Wang
Danhuai Zhao
Tong Lu
41
5
0
15 Oct 2024
LoGra-Med: Long Context Multi-Graph Alignment for Medical
  Vision-Language Model
LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model
Duy M. H. Nguyen
N. T. Diep
Trung Q. Nguyen
Hoang-Bao Le
Tai Nguyen
...
Pengtao Xie
Roger Wattenhofer
James Zhou
Daniel Sonntag
Mathias Niepert
VLM
49
1
0
03 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
36
32
1
30 Sep 2024
FreeVA: Offline MLLM as Training-Free Video Assistant
FreeVA: Offline MLLM as Training-Free Video Assistant
Wenhao Wu
VLM
OffRL
26
19
0
13 May 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
120
106
0
08 Feb 2024
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
185
576
0
16 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
116
367
0
07 Nov 2023
Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for
  Multimodal Medical Diagnosis
Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis
Chaoyi Wu
Jiayu Lei
Qiaoyu Zheng
Weike Zhao
Weixiong Lin
...
Xiao Zhou
Ziheng Zhao
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
LM&MA
85
73
0
15 Oct 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Wenhao Wu
Haipeng Luo
Bo Fang
Jingdong Wang
Wanli Ouyang
88
80
0
31 Dec 2022
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition
  with Pre-trained Vision-Language Models
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
94
47
0
31 Dec 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Tsung-Yi Lin
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
154
3,574
0
09 Dec 2016
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
L. V. D. van der Maaten
Kilian Q. Weinberger
PINN
3DV
244
35,884
0
25 Aug 2016
1