ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.17263
  4. Cited By
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
  by Re-Synthesis

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis

Computer Vision and Pattern Recognition (CVPR), 2022
31 March 2022
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
    VGen
ArXiv (abs)PDFHTMLGithub (107★)

Papers citing "Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis"

20 / 20 papers shown
Title
AUREXA-SE: Audio-Visual Unified Representation Exchange Architecture with Cross-Attention and Squeezeformer for Speech Enhancement
AUREXA-SE: Audio-Visual Unified Representation Exchange Architecture with Cross-Attention and Squeezeformer for Speech Enhancement
M. Sajid
Deepanshu Gupta
Yash Modi
Sanskriti Jain
Harshith Jai Surya Ganji
A. Rahaman
Harshvardhan Choudhary
Nasir Saleem
Amir Hussain
M. Tanveer
80
0
0
06 Oct 2025
Real-Time System for Audio-Visual Target Speech Enhancement
Real-Time System for Audio-Visual Target Speech Enhancement
T. Aleksandra Ma
Sile Yin
Li-Chia Yang
Shuo Zhang
108
0
0
25 Sep 2025
Pain in 3D: Generating Controllable Synthetic Faces for Automated Pain Assessment
Pain in 3D: Generating Controllable Synthetic Faces for Automated Pain Assessment
Xin Lei Lin
Soroush Mehraban
Abhishek Moturu
Babak Taati
3DHMedIm
223
0
0
20 Sep 2025
Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations
Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations
Teng
Sile Yin
Li-Chia Yang
Shuo Zhang
144
1
0
29 Jul 2025
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
Zixuan Li
Xueliang Zhang
Lei Miao
Zhipeng Yan
Ying Sun
Chong Zhu
159
0
0
28 May 2025
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional
  Flow Matching
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung
Suyeon Lee
Ji-Hoon Kim
Joon Son Chung
DiffM
230
18
0
13 Jun 2024
TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio
  and Bone Conduction Speech Super Resolution and Enhancement on Mobile and
  Wearable Platforms
TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable PlatformsProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024
Yueyuan Sui
Minghui Zhao
Junxi Xia
Xiaofan Jiang
S. Xia
Mamba
236
17
0
02 May 2024
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and
  Audio
Sounding Bodies: Modeling 3D Spatial Sound of Humans Using Body Pose and AudioNeural Information Processing Systems (NeurIPS), 2023
Xudong Xu
Dejan Marković
Jacob Sandakly
Todd Keebler
Steven Krenn
Alexander Richard
133
8
0
01 Nov 2023
Seeing Through the Conversation: Audio-Visual Speech Separation based on
  Diffusion Model
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Suyeon Lee
Chaeyoung Jung
Youngjoon Jang
Jaehun Kim
Joon Son Chung
225
14
0
30 Oct 2023
AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised
  Features for Audio-Visual Speech Enhancement
AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech EnhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ju-Chieh Chou
Chung-Ming Chien
Karen Livescu
DiffM
373
9
0
14 Sep 2023
Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks
  and Zero-Curl Regularization
Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl RegularizationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Xianghui Yang
Guosheng Lin
Huan Wang
Luping Zhou
294
2
0
04 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
RepCodec: A Speech Representation Codec for Speech TokenizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhichao Huang
Chutong Meng
Tom Ko
205
41
0
31 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGenDiffM
272
1
0
31 Jul 2023
Audio-Visual Speech Enhancement With Selective Off-Screen Speech
  Extraction
Audio-Visual Speech Enhancement With Selective Off-Screen Speech ExtractionEuropean Signal Processing Conference (EUSIPCO), 2023
Tomoya Yoshinaga
Keitaro Tanaka
Shigeo Morishima
165
1
0
10 Jun 2023
Incorporating Ultrasound Tongue Images for Audio-Visual Speech
  Enhancement through Knowledge Distillation
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge DistillationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Ruixin Zheng
Yang Ai
Zhenhua Ling
215
15
0
24 May 2023
Neural Vector Fields: Implicit Representation by Explicit Learning
Neural Vector Fields: Implicit Representation by Explicit LearningComputer Vision and Pattern Recognition (CVPR), 2023
Xianghui Yang
Guosheng Lin
Huan Wang
Luping Zhou
AI4CE
205
25
0
08 Mar 2023
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion PriorComputer Vision and Pattern Recognition (CVPR), 2023
Jinbo Xing
Menghan Xia
Yuechen Zhang
Xiaodong Cun
Jue Wang
T. Wong
304
196
0
06 Jan 2023
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for
  Universal and Generalized Speech Enhancement
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement
Wei-Ning Hsu
Tal Remez
Bowen Shi
Jacob Donley
Yossi Adi
DiffM
228
13
0
21 Dec 2022
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural VocodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Rodrigo Mira
Buye Xu
Jacob Donley
Anurag Kumar
Stavros Petridis
V. Ithapu
Maja Pantic
196
16
0
20 Nov 2022
Context-sensitive neocortical neurons transform the effectiveness and
  efficiency of neural information processing
Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing
Ahsan Adeel
Mario Franco
Mohsin Raza
K. Ahmed
267
9
0
15 Jul 2022
1