Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

8 November 2020

Papers citing "Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations"

12 / 12 papers shown

Robust Audiovisual Speech Recognition Models with Mixture-of-ExpertsSpoken Language Technology Workshop (SLT), 2024

Yihan Wu

Yifan Peng

Yichen Lu

Xuankai Chang

Ruihua Song

Shinji Watanabe

338

19 Sep 2024

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data

Yichen Lu

Álvaro Huertas-García

Xuankai Chang

Hengwei Bian

Soumi Maiti

Shinji Watanabe

266

01 Aug 2024

Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR

Ehsan Shareghi

187

16 Jun 2024

VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Minglun Han

Bo Xu

234

31 May 2023

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASRComputer Vision and Pattern Recognition (CVPR), 2023

Paul Hongsuck Seo

Arsha Nagrani

Cordelia Schmid

257

29 Mar 2023

Going for GOAL: A Resource for Grounded Football Commentaries

Malvina Nikandrou

177

08 Nov 2022

Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Pradip Pramanick

Chayan Sarkar

291

21 Oct 2022

AVATAR: Unconstrained Audiovisual Speech RecognitionInterspeech (Interspeech), 2022

175

15 Jun 2022

Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations

Dan Oneaţă

H. Cucu

152

27 Apr 2022

Recent Advances in End-to-End Automatic Speech RecognitionAPSIPA Transactions on Signal and Information Processing (TASIP), 2021

Jinyu Li

VLM

502

443

02 Nov 2021

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

Jing Liu

...

340

01 Jul 2021

Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker VerificationComputing and informatics (CAI), 2020

297

21 Dec 2020