ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.16501
  4. Cited By
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
  AV-ASR

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

29 March 2023
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
ArXivPDFHTML

Papers citing "AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR"

16 / 16 papers shown
Title
Visual-Aware Speech Recognition for Noisy Scenarios
Visual-Aware Speech Recognition for Noisy Scenarios
Lakshmipathi Balaji
Karan Singla
26
0
0
09 Apr 2025
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Christopher Simic
K. Riedhammer
Tobias Bocklet
86
0
0
03 Feb 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
SpeechQE: Estimating the Quality of Direct Speech Translation
SpeechQE: Estimating the Quality of Direct Speech Translation
HyoJung Han
Kevin Duh
Marine Carpuat
20
0
0
28 Oct 2024
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Yihan Wu
Yifan Peng
Yichen Lu
Xuankai Chang
Ruihua Song
Shinji Watanabe
31
2
0
19 Sep 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and
  Translation via Language Model and Synthetic Data
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
37
1
0
01 Aug 2024
Exploring the Potential of Multimodal LLM with Knowledge-Intensive
  Multimodal ASR
Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR
Minghan Wang
Yuxia Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
18
0
0
16 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
33
7
0
07 Jun 2024
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for
  Noise-Robust Speech Perception
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
HyoJung Han
Mohamed Anwar
J. Pino
Wei-Ning Hsu
Marine Carpuat
Bowen Shi
Changhan Wang
VLM
27
9
0
21 Mar 2024
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Haoxu Wang
Fan Yu
Xian Shi
Yuezhang Wang
Shiliang Zhang
Ming Li
14
11
0
11 Sep 2023
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
218
1,017
0
13 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
278
3,784
0
18 Apr 2021
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
221
0
12 Feb 2021
Pushing the Limits of Semi-Supervised Learning for Automatic Speech
  Recognition
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
James Qin
Daniel S. Park
Wei Han
Chung-Cheng Chiu
Ruoming Pang
Quoc V. Le
Yonghui Wu
VLM
SSL
133
307
0
20 Oct 2020
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
160
782
0
16 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,435
0
26 Sep 2016
1