ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.04084
  4. Cited By
Listen, Look and Deliberate: Visual context-aware speech recognition
  using pre-trained text-video representations

Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

8 November 2020
Shahram Ghorbani
Yashesh Gaur
Yu Shi
Jinyu Li
ArXiv (abs)PDFHTML

Papers citing "Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations"

12 / 12 papers shown
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Robust Audiovisual Speech Recognition Models with Mixture-of-ExpertsSpoken Language Technology Workshop (SLT), 2024
Yihan Wu
Yifan Peng
Yichen Lu
Xuankai Chang
Ruihua Song
Shinji Watanabe
338
7
0
19 Sep 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and
  Translation via Language Model and Synthetic Data
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
266
2
0
01 Aug 2024
Exploring the Potential of Multimodal LLM with Knowledge-Intensive
  Multimodal ASR
Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR
Minghan Wang
Yuxia Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
187
1
0
16 Jun 2024
VILAS: Exploring the Effects of Vision and Language Context in Automatic
  Speech Recognition
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ziyi Ni
Minglun Han
Feilong Chen
Linghui Meng
Jing Shi
Shuang Xu
Bo Xu
234
4
0
31 May 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
  AV-ASR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASRComputer Vision and Pattern Recognition (CVPR), 2023
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
257
26
0
29 Mar 2023
Going for GOAL: A Resource for Grounded Football Commentaries
Going for GOAL: A Resource for Grounded Football Commentaries
Alessandro Suglia
José Lopes
E. Bastianelli
Andrea Vanzo
Shubham Agarwal
Malvina Nikandrou
Lu Yu
Ioannis Konstas
Verena Rieser
177
9
0
08 Nov 2022
Can Visual Context Improve Automatic Speech Recognition for an Embodied
  Agent?
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Pradip Pramanick
Chayan Sarkar
291
8
0
21 Oct 2022
AVATAR: Unconstrained Audiovisual Speech Recognition
AVATAR: Unconstrained Audiovisual Speech RecognitionInterspeech (Interspeech), 2022
Valentin Gabeur
Paul Hongsuck Seo
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
175
17
0
15 Jun 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech
  Representations
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
152
25
0
27 Apr 2022
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech RecognitionAPSIPA Transactions on Signal and Information Processing (TASIP), 2021
Jinyu Li
VLM
502
443
0
02 Nov 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and
  Generation
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Jing Liu
Xinxin Zhu
Fei Liu
Longteng Guo
Zijia Zhao
...
Weining Wang
Hanqing Lu
Shiyu Zhou
Jiajun Zhang
Jinqiao Wang
340
41
0
01 Jul 2021
Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker Verification
Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker VerificationComputing and informatics (CAI), 2020
Wei Yao
Shen Chen
Jiamin Cui
Yaolin Lou
297
7
0
21 Dec 2020
1
Page 1 of 1