Looking Enhances Listening: Recovering Missing Speech Using Images

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

13 February 2020

Papers citing "Looking Enhances Listening: Recovering Missing Speech Using Images"

10 / 10 papers shown

VHASR: A Multimodal Speech Recognition System With Vision HotwordsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Jiliang Hu

Zuchao Li

Ping Wang

Haojun Ai

Lefei Zhang

Hai Zhao

189

01 Oct 2024

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASRComputer Vision and Pattern Recognition (CVPR), 2023

Paul Hongsuck Seo

Arsha Nagrani

Cordelia Schmid

201

29 Mar 2023

Multimodal Speech Recognition for Language-Guided Embodied AgentsInterspeech (Interspeech), 2023

348

27 Feb 2023

AVATAR: Unconstrained Audiovisual Speech RecognitionInterspeech (Interspeech), 2022

127

15 Jun 2022

Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Dan Oneaţă

H. Cucu

121

27 Apr 2022

Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey

Ngoc Dung Huynh

Mohamed Reda Bouadjenek

Imran Razzak

182

22 Feb 2022

Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

117

08 Nov 2020

Multimodal Speech Recognition with Unstructured Audio Masking

120

16 Oct 2020

Fine-Grained Grounding for Multimodal Speech RecognitionFindings (Findings), 2020

161

05 Oct 2020

Experience Grounds Language

...

534

403

21 Apr 2020