VHASR: A Multimodal Speech Recognition System With Vision HotwordsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
AV-ASRComputer Vision and Pattern Recognition (CVPR), 2023 |
Multimodal Speech Recognition for Language-Guided Embodied AgentsInterspeech (Interspeech), 2023 |
AVATAR: Unconstrained Audiovisual Speech RecognitionInterspeech (Interspeech), 2022 |
Fine-Grained Grounding for Multimodal Speech RecognitionFindings (Findings), 2020 |