ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.12681
  4. Cited By
What Makes Training Multi-Modal Classification Networks Hard?

What Makes Training Multi-Modal Classification Networks Hard?

29 May 2019
Weiyao Wang
Du Tran
Matt Feiszli
ArXivPDFHTML

Papers citing "What Makes Training Multi-Modal Classification Networks Hard?"

50 / 64 papers shown
Title
TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
Feng Liu
Ziwang Fu
Y. Wang
Qijian Zheng
40
4
0
10 May 2025
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
Junehyoung Kwon
Mihyeon Kim
Eunju Lee
Juhwan Choi
Youngbin Kim
53
0
0
18 Mar 2025
DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
Chengxuan Qian
Kai Han
J. Wang
Zhenlong Yuan
Rui Qian
Chongwen Lyu
Jun Chen
46
1
0
09 Mar 2025
FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning
FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning
Jason Jingzhou Liu
Yulong Li
Kenneth Shaw
Tony Tao
Ruslan Salakhutdinov
Deepak Pathak
OffRL
59
1
0
24 Feb 2025
A Self-supervised Multimodal Deep Learning Approach to Differentiate Post-radiotherapy Progression from Pseudoprogression in Glioblastoma
A Self-supervised Multimodal Deep Learning Approach to Differentiate Post-radiotherapy Progression from Pseudoprogression in Glioblastoma
A. Gomaa
Yixing Huang
Pluvio Stephan
Katharina Breininger
Benjamin Frey
...
U. Gaipl
Christoph Bert
R. Fietkau
M. Schmidt
F. Putz
84
0
0
06 Feb 2025
Generalized Task-Driven Medical Image Quality Enhancement with Gradient Promotion
Dong Zhang
Kwang-Ting Cheng
MedIm
20
0
0
03 Jan 2025
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin
Oh Hyun-Bin
JungMok Lee
Arda Senocak
Joon Son Chung
Tae-Hyun Oh
MLLM
VLM
40
3
0
23 Oct 2024
Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records
Improving Colorectal Cancer Screening and Risk Assessment through Predictive Modeling on Medical Images and Records
Shuai Jiang
Christina Robinson
Joseph Anderson
William Hisey
Lynn Butterly
A. Suriawinata
Saeed Hassanpour
19
0
0
13 Oct 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Zae Myung Kim
Dongyeop Kang
Yao-Yi Chiang
Alfred Hero
25
0
0
02 Oct 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal
  Learning with Missing Modalities and Data Scarcity
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
34
2
0
14 Mar 2024
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using
  transformers
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers
James Gunn
Zygmunt Lenyk
Anuj Sharma
Andrea Donati
Alexandru Buburuzan
John Redford
Romain Mueller
MDE
32
8
0
22 Dec 2023
Modality Mixer Exploiting Complementary Information for Multi-modal
  Action Recognition
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Muhammad Adi Nugroho
Changick Kim
25
0
0
21 Nov 2023
Improving Discriminative Multi-Modal Learning with Large-Scale
  Pre-Trained Models
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models
Chenzhuang Du
Yue Zhao
Chonghua Liao
Jiacheng You
Jie Fu
Hang Zhao
30
2
0
08 Oct 2023
Audio-Visual Speaker Verification via Joint Cross-Attention
Audio-Visual Speaker Verification via Joint Cross-Attention
R Gnana Praveen
Jahangir Alam
26
6
0
28 Sep 2023
Interpretation on Multi-modal Visual Fusion
Interpretation on Multi-modal Visual Fusion
Hao Chen
Hao Zhou
Yongjian Deng
28
0
0
19 Aug 2023
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Hong Li
Xingyu Li
Pengbo Hu
Yinuo Lei
Chunxiao Li
Yi Zhou
28
20
0
15 Aug 2023
MultiWave: Multiresolution Deep Architectures through Wavelet
  Decomposition for Multivariate Time Series Prediction
MultiWave: Multiresolution Deep Architectures through Wavelet Decomposition for Multivariate Time Series Prediction
I. Deznabi
M. Fiterau
AI4TS
30
5
0
16 Jun 2023
Continual Multimodal Knowledge Graph Construction
Continual Multimodal Knowledge Graph Construction
Xiang Chen
Jintian Zhang
Xiaohan Wang
Ningyu Zhang
Tongtong Wu
Luo Si
Yongheng Wang
Huajun Chen
KELM
CLL
27
14
0
15 May 2023
Patchwork Learning: A Paradigm Towards Integrative Analysis across
  Diverse Biomedical Data Sources
Patchwork Learning: A Paradigm Towards Integrative Analysis across Diverse Biomedical Data Sources
Suraj Rajendran
Weishen Pan
M. Sabuncu
Yong Chen
Jiayu Zhou
Fei Wang
51
14
0
10 May 2023
Radar-Camera Fusion for Object Detection and Semantic Segmentation in
  Autonomous Driving: A Comprehensive Review
Radar-Camera Fusion for Object Detection and Semantic Segmentation in Autonomous Driving: A Comprehensive Review
Shanliang Yao
Runwei Guan
Xiaoyu Huang
Zhuoxiao Li
Xiangyu Sha
...
Eng Gee Lim
H. Seo
Ka Lok Man
Xiaohui Zhu
Yutao Yue
41
91
0
20 Apr 2023
Multimodal Hyperspectral Image Classification via Interconnected Fusion
Multimodal Hyperspectral Image Classification via Interconnected Fusion
Lu Huo
Jiahao Xia
Leijie Zhang
Haimin Zhang
Min Xu
17
2
0
02 Apr 2023
Balanced Audiovisual Dataset for Imbalance Analysis
Balanced Audiovisual Dataset for Imbalance Analysis
Wenke Xia
Xu Zhao
Xincheng Pang
Changqing Zhang
Di Hu
29
1
0
14 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
22
1
0
07 Feb 2023
Rethinking Soft Label in Label Distribution Learning Perspective
Rethinking Soft Label in Label Distribution Learning Perspective
Seungbum Hong
Jihun Yoon
Bogyu Park
Min-Kook Choi
31
0
0
31 Jan 2023
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance
  Industry
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry
Azin Asgarian
Rohit Saha
Daniel Jakubovitz
Julia Peyre
24
2
0
15 Jan 2023
Toward Building General Foundation Models for Language, Vision, and
  Vision-Language Understanding Tasks
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLM
AI4CE
LRM
14
17
0
12 Jan 2023
A Survey on Human Action Recognition
A Survey on Human Action Recognition
Zhou Shuchang
29
0
0
20 Dec 2022
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive
  Survey
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
Yuecong Xu
Haozhi Cao
Zhenghua Chen
Xiaoli Li
Lihua Xie
Jianfei Yang
24
14
0
17 Nov 2022
PMR: Prototypical Modal Rebalance for Multimodal Learning
PMR: Prototypical Modal Rebalance for Multimodal Learning
Yunfeng Fan
Wenchao Xu
Haozhao Wang
Junxiao Wang
Song Guo
23
60
0
14 Nov 2022
Uncertainty Estimation for Multi-view Data: The Power of Seeing the
  Whole Picture
Uncertainty Estimation for Multi-view Data: The Power of Seeing the Whole Picture
M. Jung
He Zhao
Joanna Dipnall
Belinda Gabbe
Lan Du
UQCV
EDL
55
12
0
06 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
32
119
0
02 Oct 2022
Multimodal Analogical Reasoning over Knowledge Graphs
Multimodal Analogical Reasoning over Knowledge Graphs
Ningyu Zhang
Lei Li
Xiang Chen
Xiaozhuan Liang
Shumin Deng
Huajun Chen
54
26
0
01 Oct 2022
Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space
  Using Joint Cross-Attention
Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
R Gnana Praveen
Eric Granger
P. Cardinal
CVBM
48
31
0
19 Sep 2022
DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality
  Attention
DM2^22S2^22: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention
Shunsuke Kitada
Yuki Iwazaki
Riku Togashi
Hitoshi Iyatomi
21
1
0
07 Sep 2022
Modality Mixer for Multi-modal Action Recognition
Modality Mixer for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
19
10
0
24 Aug 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
25
20
0
29 Jul 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and
  Applications
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
24
52
0
02 Jun 2022
Structured Attention Composition for Temporal Action Localization
Structured Attention Composition for Temporal Action Localization
Le Yang
Junwei Han
Tao Zhao
Nian Liu
Dingwen Zhang
37
17
0
20 May 2022
SHAPE: An Unified Approach to Evaluate the Contribution and Cooperation
  of Individual Modalities
SHAPE: An Unified Approach to Evaluate the Contribution and Cooperation of Individual Modalities
Pengbo Hu
Xingyu Li
Yi Zhou
30
10
0
30 Apr 2022
Trusted Multi-View Classification with Dynamic Evidential Fusion
Trusted Multi-View Classification with Dynamic Evidential Fusion
Zongbo Han
Changqing Zhang
H. Fu
Joey Tianyi Zhou
EDL
20
217
0
25 Apr 2022
Multi-View Hypercomplex Learning for Breast Cancer Screening
Multi-View Hypercomplex Learning for Breast Cancer Screening
Eleonora Lopez
Eleonora Grassucci
Martina Valleriani
Danilo Comminiello
28
8
0
12 Apr 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition
  on Modality-Specific Annotated Videos
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
Saghir Alfasly
Jian Lu
C. Xu
Yuru Zou
34
18
0
06 Mar 2022
Dense Voxel Fusion for 3D Object Detection
Dense Voxel Fusion for 3D Object Detection
Anas Mahmoud
Jordan S. K. Hu
Steven L. Waslander
3DPC
20
45
0
02 Mar 2022
Multi-task UNet: Jointly Boosting Saliency Prediction and Disease
  Classification on Chest X-ray Images
Multi-task UNet: Jointly Boosting Saliency Prediction and Disease Classification on Chest X-ray Images
Hongzhi Zhu
R. Rohling
Septimiu Salcudean
14
4
0
15 Feb 2022
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
33
73
0
25 Nov 2021
TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial
  Decoding
TEAM-Net: Multi-modal Learning for Video Action Recognition with Partial Decoding
Zhengwei Wang
Qi She
A. Smolic
21
9
0
17 Oct 2021
Decoder Fusion RNN: Context and Interaction Aware Decoders for
  Trajectory Prediction
Decoder Fusion RNN: Context and Interaction Aware Decoders for Trajectory Prediction
Edoardo Mello Rella
Jan-Nico Zaech
Alexander Liniger
Luc Van Gool
AI4CE
21
14
0
12 Aug 2021
Multi-modal Residual Perceptron Network for Audio-Video Emotion
  Recognition
Multi-modal Residual Perceptron Network for Audio-Video Emotion Recognition
Xin Chang
W. Skarbek
22
19
0
21 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
26
158
0
15 Jul 2021
12
Next