Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.10802
Cited By
Meta-Transformer: A Unified Framework for Multimodal Learning
20 July 2023
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Meta-Transformer: A Unified Framework for Multimodal Learning"
50 / 102 papers shown
Title
VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection
Hao Cheng
Zhiwei Zhao
Yichao He
Zhenzhen Hu
Jia Li
M. Wang
Richang Hong
36
0
0
05 May 2025
TxP: Reciprocal Generation of Ground Pressure Dynamics and Activity Descriptions for Improving Human Activity Recognition
L. Ray
L. Krupp
Vitor Fortes Rey
Bo Zhou
Sungho Suh
P. Lukowicz
AI4CE
40
0
0
04 May 2025
Representation Learning for Tabular Data: A Comprehensive Survey
Jun-Peng Jiang
Si-Yang Liu
Hao-Run Cai
Qile Zhou
Han-Jia Ye
LMTD
38
0
0
17 Apr 2025
NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results
Xin Li
Yeying Jin
Xin Jin
Zongwei Wu
Bingchen Li
...
Jieyuan Pei
Z. Li
J. Wang
Haoyu Bian
Haoran Sun
51
5
0
17 Apr 2025
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Runnan Fang
Xiaobin Wang
Yuan Liang
Shuofei Qiao
Jialong Wu
...
N. Zhang
Yong-feng Jiang
Pengjun Xie
Fei Huang
H. Chen
LLMAG
67
0
0
04 Apr 2025
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
Niu Lian
Jun Li
Jinpeng Wang
Ruisheng Luo
Yaowei Wang
Shu-Tao Xia
Bin Chen
36
0
0
04 Apr 2025
Tokenization of Gaze Data
Tim Rolff
Jurik Karimian
Niklas Hypki
S. Schmidt
Markus Lappe
Frank Steinicke
31
0
0
28 Mar 2025
PAVE: Patching and Adapting Video Large Language Models
Zhuoming Liu
Yiquan Li
Khoi Duc Nguyen
Yiwu Zhong
Yin Li
KELM
LRM
79
0
0
25 Mar 2025
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
F. Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
54
0
0
18 Mar 2025
A Multi-Modal Federated Learning Framework for Remote Sensing Image Classification
Barış Büyüktaş
Gencer Sumbul
Begum Demir
36
0
0
13 Mar 2025
A Closer Look at TabPFN v2: Strength, Limitation, and Extension
Han-Jia Ye
Si-Yang Liu
Wei-Lun Chao
34
3
0
24 Feb 2025
Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach
Timo Fudala
Vasileios Tsouvalas
N. Meratnia
MoE
41
0
0
10 Feb 2025
QCS: Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition
C. Wang
Li Chen
Lili Wang
Zhaofan Li
Xuebin Lv
76
1
0
28 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
D. Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
104
102
0
10 Jan 2025
AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Ten Modalities via Language as a Reference Framework
Run Shao
Cheng Yang
Qiujun Li
Qing Zhu
Yongjun Zhang
...
Yu Liu
Yong Tang
Dapeng Liu
Shizhong Yang
Haifeng Li
106
0
0
08 Jan 2025
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Zhiqiang Tang
Zihan Zhong
Tong He
Gerald Friedland
73
0
0
19 Dec 2024
PSA-VLM: Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment
Zhendong Liu
Yuanbi Nie
Yingshui Tan
Xiangyu Yue
Qiushi Cui
Chongjun Wang
Xiaoyong Zhu
Bo Zheng
Bo Zheng
68
0
0
18 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
M. Zhang
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
46
9
0
08 Nov 2024
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Zhixin Zhang
Yiyuan Zhang
Xiaohan Ding
Xiangyu Yue
16
3
0
28 Oct 2024
Trajectory Flow Matching with Applications to Clinical Time Series Modeling
Xi Zhang
Yuan Pu
Yuki Kawamura
Andrew Loza
Yoshua Bengio
Dennis L. Shung
Alexander Tong
OOD
AI4TS
MedIm
23
2
0
28 Oct 2024
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen
Jianfei Yang
28
0
0
14 Oct 2024
Bridging the Gap between Text, Audio, Image, and Any Sequence: A Novel Approach using Gloss-based Annotation
Sen Fang
Sizhou Chen
Yalin Feng
Xiaofeng Zhang
T. Teoh
23
0
0
04 Oct 2024
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
48
11
0
26 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Bilal Faye
Hanane Azzag
M. Lebbah
ObjD
21
0
0
17 Sep 2024
SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Chenyang Lei
Liyi Chen
Jun Cen
Xiao Chen
Zhen Lei
Felix Heide
Ziwei Liu
Qifeng Chen
Zhaoxiang Zhang
26
0
0
12 Sep 2024
IVGF: The Fusion-Guided Infrared and Visible General Framework
Fangcen Liu
Chenqiang Gao
Fang Chen
Pengcheng Li
Junjie Guo
Deyu Meng
29
0
0
02 Sep 2024
Segment Anything for Videos: A Systematic Survey
Chunhui Zhang
Yawen Cui
Weilin Lin
Guanjie Huang
Yan Rong
Li Liu
Shiguang Shan
VLM
39
6
0
31 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
Junguo Cheng
MQ
20
0
0
22 Jul 2024
Foundation Models for Autonomous Robots in Unstructured Environments
Hossein Naderi
Alireza Shojaei
Lifu Huang
LM&Ro
40
0
0
19 Jul 2024
Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
47
10
0
16 Jul 2024
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
Yi Zhang
Wang Zeng
Sheng Jin
Chao Qian
Ping Luo
Wentao Liu
27
4
0
14 Jul 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao
Bo Wan
Xu Jia
Yunzhi Zhuge
Ying Zhang
Huchuan Lu
Long Chen
VLM
35
4
0
10 Jul 2024
Generative AI for RF Sensing in IoT systems
Li Wang
Chao Zhang
Qiyang Zhao
Hang Zou
S. Lasaulce
Giuseppe Valenzise
Zhuo He
Mérouane Debbah
24
3
0
10 Jul 2024
Learning Modality Knowledge Alignment for Cross-Modality Transfer
Wenxuan Ma
Shuang Li
Lincan Cai
Jingxuan Kang
21
1
0
27 Jun 2024
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other
Yifei Gao
Jie Ou
Lei Wang
Yuting Xiao
Zhiyuan Xiang
Ruiting Dai
Jun Cheng
MQ
31
1
0
24 Jun 2024
Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health
Bo Wen
R. Norel
Julia Liu
Thaddeus Stappenbeck
F. Zulkernine
Huamin Chen
AI4MH
LM&MA
32
2
0
19 Jun 2024
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges
Yuqi Nie
Yaxuan Kong
Xiaowen Dong
John M. Mulvey
H. Vincent Poor
Qingsong Wen
Stefan Zohren
AIFin
38
40
0
15 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
38
1
0
13 Jun 2024
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Roman Bachmann
Oğuzhan Fatih Kar
David Mizrahi
Ali Garjani
Mingfei Gao
David Griffiths
Jiaming Hu
Afshin Dehghan
Amir Zamir
MoE
VLM
MLLM
28
14
0
13 Jun 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
35
2
0
08 Jun 2024
OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All
Yuanhuiyi Lyu
Xueye Zheng
Dahun Kim
Lin Wang
32
10
0
25 May 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Xianglong Liu
Luca Benini
Michele Magno
Xiaojuan Qi
MQ
48
15
0
23 May 2024
Safety Alignment for Vision Language Models
Zhendong Liu
Yuanbi Nie
Yingshui Tan
Xiangyu Yue
Qiushi Cui
Chongjun Wang
Xiaoyong Zhu
Bo Zheng
VLM
MLLM
86
6
0
22 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
30
27
0
18 May 2024
Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities
Hao Zhou
Chengming Hu
Ye Yuan
Yufei Cui
Yili Jin
...
Di Wu
Xue Liu
Charlie Zhang
Xianbin Wang
Jiangchuan Liu
30
55
0
17 May 2024
An Overview of Machine Learning-Enabled Optimization for Reconfigurable Intelligent Surfaces-Aided 6G Networks: From Reinforcement Learning to Large Language Models
Hao Zhou
Chengming Hu
Xue Liu
AI4CE
28
0
0
09 May 2024
All in One Framework for Multimodal Re-identification in the Wild
He Li
Mang Ye
Ming Zhang
Bo Du
23
9
0
08 May 2024
Interpretable Tensor Fusion
Saurabh Varshneya
Antoine Ledent
Philipp Liznerski
Andriy Balinskyy
Purvanshi Mehta
Waleed Mustafa
Marius Kloft
17
1
0
07 May 2024
Octopi: Object Property Reasoning with Large Tactile-Language Models
Samson Yu
Kelvin Lin
Anxing Xiao
Jiafei Duan
Harold Soh
LRM
29
21
0
05 May 2024
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
Ming Yan
Yan Zhang
Shuqiang Cai
Shuqi Fan
Xincheng Lin
...
Siqi Shen
Chenglu Wen
Lan Xu
Yuexin Ma
Cheng-Yu Wang
31
6
0
28 Mar 2024
1
2
3
Next