Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.04084
Cited By
Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations
8 November 2020
Shahram Ghorbani
Yashesh Gaur
Yu Shi
Jinyu Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations"
3 / 3 papers shown
Title
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
29
15
0
29 Mar 2023
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?
Pradip Pramanick
Chayan Sarkar
18
7
0
21 Oct 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
19
19
0
27 Apr 2022
1