Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

8 November 2020

Papers citing "Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations"

3 / 3 papers shown

Title
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid 29 15 0 29 Mar 2023
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent? Pradip Pramanick Chayan Sarkar 21 7 0 21 Oct 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations Dan Oneaţă H. Cucu 19 19 0 27 Apr 2022