v1v2 (latest)

Deep Speech: Scaling up end-to-end speech recognition

17 December 2014

Papers citing "Deep Speech: Scaling up end-to-end speech recognition"

50 / 768 papers shown

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

Zhiyong Wu

Jun Ling

163

09 Aug 2023

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

195

03 Aug 2023

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech RecognitionInterspeech (Interspeech), 2023

Shinji Watanabe

188

24 Jul 2023

A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC VideosInternational Conference on Web and Social Media (ICWSM), 2023

Anand Rai

Siddharth D. Jaiswal

Animesh Mukherjee

171

20 Jul 2023

Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait SynthesisIEEE International Conference on Computer Vision (ICCV), 2023

235

110

18 Jul 2023

SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark

Ge Zhang

...

179

12 Jul 2023

Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays

R. Bhattacharjea

Nathan E. West

SSL

06 Jul 2023

Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play FrameworkInternational Conference on Learning Representations (ICLR), 2023

...

305

04 Jul 2023

Robust Proxy: Improving Adversarial Robustness by Robust Proxy LearningIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2023

Hong Joo Lee

Yonghyun Ro

AAML

164

27 Jun 2023

Scaling and Resizing Symmetry in Feedforward Networks

Carlos Cardona

153

26 Jun 2023

MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phonesInternational Conference on AI-ML-Systems (ICA), 2023

Zitha Sasindran

Harsha Yelchuri

Pooja S B. Rao

Prabhakar Venkata Tamma

187

15 Jun 2023

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech RecognitionInterspeech (Interspeech), 2023

Muhammad Umar Farooq

Thomas Hain

109

14 Jun 2023

Get More for Less in Decentralized Learning SystemsIEEE International Conference on Distributed Computing Systems (ICDCS), 2023

215

07 Jun 2023

Text-only Domain Adaptation using Unified Speech-Text Representation in TransducerInterspeech (Interspeech), 2023

231

07 Jun 2023

Looking and Listening: Audio Guided Text Recognition

Yuliang Liu

156

06 Jun 2023

Using Sequences of Life-events to Predict Human LivesNature Computational Science (Nat. Comput. Sci.), 2023

231

05 Jun 2023

DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative InferenceIEEE Transactions on Mobile Computing (IEEE TMC), 2023

286

02 Jun 2023

Encoder-decoder multimodal speaker change detectionInterspeech (Interspeech), 2023

164

01 Jun 2023

Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication

Emin Cagatay Nakilcioglu

M. Reimann

O. John

01 Jun 2023

Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance SystemIEEE Internet of Things Journal (IEEE IoT J.), 2023

193

30 May 2023

RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

192

24 May 2023

Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person

L. Gris

R. Marcacini

Arnaldo Cândido Júnior

Edresson Casanova

A. S. Soares

S. Aluísio

237

23 May 2023

QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition SystemsUSENIX Security Symposium (USENIX Security), 2023

225

23 May 2023

Study of GANs for Noisy Speech Simulation from Clean Speech

Chen Chen

113

21 May 2023

Decision-based iterative fragile watermarking for model integrity verification

261

13 May 2023

Masked Audio Text Encoders are Effective Multi-Modal RescorersAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

345

11 May 2023

Deep Learning and Geometric Deep Learning: an introduction for mathematicians and physicistsInternational Journal of Geometric Methods in Modern Physics (IJGMMP) (IJGMMP), 2023

R. Fioresi

F. Zanchetta

PINN

112

09 May 2023

Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model EstimationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

152

05 May 2023

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

Rongjie Huang

Xiang Yin

Zhou Zhao

211

01 May 2023

Affective social anthropomorphic intelligent systemMultimedia tools and applications (MTA), 2023

Md. Adyelullahil Mamun

Hasnat Md. Abdullah

Md. Golam Rabiul Alam

Muhammad Mehedi Hassan

Md. Zia Uddin

118

19 Apr 2023

ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

Tomas Pfister

368

07 Apr 2023

Robustmix: Improving Robustness by Regularizing the Frequency Bias of Deep Nets

Jonas Ngnawé

Marianne Abémgnigni Njifon

Jonathan Heek

Yann N. Dauphin

OOD

110

06 Apr 2023

Style Transfer for 2D Talking Head Animation

270

17 Mar 2023

Improving Few-Shot Learning for Talking Face System with TTS Data AugmentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Xie Chen

154

09 Mar 2023

DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution VideoAAAI Conference on Artificial Intelligence (AAAI), 2023

Changjie Fan

252

07 Mar 2023

End-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

288

245

03 Mar 2023

Variational EP with Probabilistic Backpropagation for Bayesian Neural Networks

Kehinde Olobatuyi

BDL

02 Mar 2023

A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit

Mina Huh

Ruchira Ray

Corey Karnei

145

27 Feb 2023

Explanations for Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Xiao-lan Wu

P. Bell

A. Rajan

188

27 Feb 2023

Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training ModelIEEE journal of biomedical and health informatics (IEEE JBHI), 2023

188

27 Feb 2023

Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention

155

24 Feb 2023

Evaluating Automatic Speech Recognition in an Incremental Setting

Ryan Whetten

M. Imtiaz

C. Kennington

23 Feb 2023

Using Semantic Information for Defining and Detecting OOD Inputs

223

21 Feb 2023

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Andrew Rosenberg

Bhuvana Ramabhadran

AuLLM VLM

209

16 Feb 2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face SynthesisInternational Conference on Learning Representations (ICLR), 2023

Zhou Zhao

209

181

31 Jan 2023

Open Problems in Applied Deep Learning

M. Raissi

AI4CE

232

26 Jan 2023

A Data-Efficient Visual-Audio Representation with Intuitive Fine-tuning for Voice-Controlled RobotsConference on Robot Learning (CoRL), 2023

Tianchen Ji

Katherine Driggs-Campbell

188

23 Jan 2023

Neural Architecture Search: Insights from 1000 Papers

Katharina Eggensperger

3DV AI4CE

409

192

20 Jan 2023

DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits AnimationComputer Vision and Pattern Recognition (CVPR), 2023

Wenliang Zhao

Jie Zhou

277

155

10 Jan 2023

Audio-Visual Efficient Conformer for Robust Speech RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Maxime Burchi

Radu Timofte

VLM

213

04 Jan 2023