ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.09003
  4. Cited By
Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality
  Generation

Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

13 June 2024
Lincan Cai
Shuang Li
Wenxuan Ma
Jingxuan Kang
Binhui Xie
Zixun Sun
Chengwei Zhu
    MoE
    MoMe
ArXivPDFHTML

Papers citing "Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation"

19 / 19 papers shown
Title
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
D. Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
104
102
0
10 Jan 2025
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
76
242
0
29 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
156
895
0
21 Dec 2023
CogAgent: A Visual Language Model for GUI Agents
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
137
310
0
14 Dec 2023
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
Jinrong Yang
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
MLLM
VLM
66
73
0
11 Dec 2023
Language Semantic Graph Guided Data-Efficient Learning
Language Semantic Graph Guided Data-Efficient Learning
Wenxuan Ma
Shuang Li
Lincan Cai
Jingxuan Kang
19
4
0
15 Nov 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
203
883
0
27 Apr 2023
Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game
  Perspective
Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
Jinjing Zhu
Haotian Bai
Lin Wang
ViT
63
70
0
23 Mar 2023
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
Fan Bao
Shen Nie
Kaiwen Xue
Chongxuan Li
Shiliang Pu
Yaole Wang
Gang Yue
Yue Cao
Hang Su
Jun Zhu
DiffM
191
147
0
12 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo
  Labeling
FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling
Bowen Zhang
Yidong Wang
Wenxin Hou
Hao Wu
Jindong Wang
Manabu Okumura
T. Shinozaki
AAML
213
848
0
15 Oct 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
303
771
0
18 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Fourier Neural Operator for Parametric Partial Differential Equations
Fourier Neural Operator for Parametric Partial Differential Equations
Zong-Yi Li
Nikola B. Kovachki
Kamyar Azizzadenesheli
Burigede Liu
K. Bhattacharya
Andrew M. Stuart
Anima Anandkumar
AI4CE
203
2,254
0
18 Oct 2020
Domain Adaptive Ensemble Learning
Domain Adaptive Ensemble Learning
Kaiyang Zhou
Yongxin Yang
Yu Qiao
Tao Xiang
OOD
127
268
0
16 Mar 2020
AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data
AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data
Nick Erickson
Jonas W. Mueller
Alexander Shirkov
Hang Zhang
Pedro Larroy
Mu Li
Alex Smola
LMTD
84
576
0
13 Mar 2020
Curriculum Learning for Reinforcement Learning Domains: A Framework and
  Survey
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey
Sanmit Narvekar
Bei Peng
Matteo Leonetti
Jivko Sinapov
Matthew E. Taylor
Peter Stone
ODL
132
451
0
10 Mar 2020
Geometric Dataset Distances via Optimal Transport
Geometric Dataset Distances via Optimal Transport
David Alvarez-Melis
Nicolò Fusi
OT
70
189
0
07 Feb 2020
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
L. V. D. van der Maaten
Kilian Q. Weinberger
PINN
3DV
244
35,884
0
25 Aug 2016
1