VHASR: A Multimodal Speech Recognition System With Vision HotwordsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
VILAS: Exploring the Effects of Vision and Language Context in Automatic
Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 |
Multimodal Speech Recognition for Language-Guided Embodied AgentsInterspeech (Interspeech), 2023 |
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment AnalysisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 |
Text is no more Enough! A Benchmark for Profile-based Spoken Language
UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2021 |