VHASR: A Multimodal Speech Recognition System With Vision Hotwords
v1v2 (latest)

VHASR: A Multimodal Speech Recognition System With Vision Hotwords

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zuchao Li
Ping Wang
Lefei Zhang
Hai Zhao

Papers citing "VHASR: A Multimodal Speech Recognition System With Vision Hotwords"

1 / 1 papers shown