ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.02264
35
1

MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

3 April 2025
Wenzhuo Liu
Wenshuo Wang
Yicheng Qiao
Qiannan Guo
Jiayin Zhu
Pengfei Li
Zilong Chen
Huiming Yang
Zhiwei Li
Lening Wang
Tiao Tan
Huaping Liu
ArXivPDFHTML
Abstract

Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emotion (e.g., anxiety, happiness), vehicle behavior (e.g., parking, turning), and traffic context (e.g., traffic jam, traffic smooth). A key challenge is avoiding negative transfer between tasks, which can impair learning performance. To address this, we introduce two key components into the framework: one is the multi-axis region attention network to extract global context-sensitive features, and the other is the dual-branch multimodal embedding to learn multimodal embeddings from both task-shared and task-specific features. The former uses a multi-attention mechanism to extract task-relevant features, mitigating negative transfer caused by task-unrelated features. The latter employs a dual-branch structure to adaptively adjust task-shared and task-specific parameters, enhancing cross-task knowledge transfer while reducing task conflicts. We assess MMTL-UniAD on the AIDE dataset, using a series of ablation studies, and show that it outperforms state-of-the-art methods across all four tasks. The code is available onthis https URL.

View on arXiv
@article{liu2025_2504.02264,
  title={ MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception },
  author={ Wenzhuo Liu and Wenshuo Wang and Yicheng Qiao and Qiannan Guo and Jiayin Zhu and Pengfei Li and Zilong Chen and Huiming Yang and Zhiwei Li and Lening Wang and Tiao Tan and Huaping Liu },
  journal={arXiv preprint arXiv:2504.02264},
  year={ 2025 }
}
Comments on this paper