Robust Audiovisual Speech Recognition Models with Mixture-of-ExpertsSpoken Language Technology Workshop (SLT), 2024 Yihan Wu Yifan Peng Yichen Lu Xuankai Chang Ruihua Song Shinji Watanabe |
VILAS: Exploring the Effects of Vision and Language Context in Automatic
Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 |
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
AV-ASRComputer Vision and Pattern Recognition (CVPR), 2023 |
Can Visual Context Improve Automatic Speech Recognition for an Embodied
Agent?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
AVATAR: Unconstrained Audiovisual Speech RecognitionInterspeech (Interspeech), 2022 |
Recent Advances in End-to-End Automatic Speech RecognitionAPSIPA Transactions on Signal and Information Processing (TASIP), 2021 |
Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker VerificationComputing and informatics (CAI), 2020 |