Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

8 December 2015

Jingdong Chen

Linxi Fan

Sharan Narang

Yi Wang

Papers citing "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin"

50 / 1,096 papers shown

Data Extrapolation for Text-to-image Generation on Small Datasets

Senmao Ye

Fei Liu

246

02 Oct 2024

WeHelp: A Shared Autonomy System for Wheelchair Users

Abulikemu Abuduweili

Alice Wu

Tianhao Wei

Weiye Zhao

147

18 Sep 2024

Open-World Test-Time Training: Self-Training with Contrast Learning

249

15 Sep 2024

ASR Error Correction using Large Language ModelsIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024

308

14 Sep 2024

The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al

Nicolad Garneau

Olivier Bolduc

ELM AILaw

164

21 Aug 2024

Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample ImportanceJournal of Computational Science and Technology (JCST), 2024

M. Milling

Shuo Liu

Andreas Triantafyllopoulos

Ilhan Aslan

Björn W. Schuller

271

12 Aug 2024

Digital Avatars: Framework Development and Their EvaluationInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

...

Yanzhi Wang

07 Aug 2024

DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs

Cynthia R. Steinhardt

Menoua Keshishian

N. Mesgarani

Kim Stachenfeld

165

30 Jul 2024

Text-based Talking Video Editing with Cascaded Conditional Diffusion

249

20 Jul 2024

CBM: Curriculum by Masking

Andrei Jarca

Florinel-Alin Croitoru

Radu Tudor Ionescu

253

06 Jul 2024

Evaluating Model Performance Under Worst-case Subpopulations

Mike Li

Hongseok Namkoong

Shangzhou Xia

295

01 Jul 2024

Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

Chao Shen

139

27 Jun 2024

Continuous Sign Language Recognition Using Intra-inter Gloss Attention

Hossein Ranjbar

Alireza Taheri

SLR

176

26 Jun 2024

Token-Weighted RNN-T for Learning from Flawed Data

Gil Keren

Wei Zhou

Ozlem Kalinli

263

26 Jun 2024

Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments

596

24 Jun 2024

Decoder-only Architecture for Streaming End-to-end Speech Recognition

Shinji Watanabe

332

23 Jun 2024

Boosting Consistency in Dual Training for Long-Tailed Semi-Supervised Learning

Kai Gan

Tong Wei

Min-Ling Zhang

262

19 Jun 2024

Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

Antonios Deligiannakis

FedML

438

31 May 2024

OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance

293

23 May 2024

Contribute to balance, wire in accordance: Emergence of backpropagation from a simple, bio-plausible neuroplasticity rulebioRxiv (bioRxiv), 2024

Xinhao Fan

S. P. Mysore

266

23 May 2024

Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

241

15 May 2024

Chaos-based reinforcement learning with TD3Neural Networks (NN), 2024

Toshitaka Matsuki

Yusuke Sakemi

Kazuyuki Aihara

316

15 May 2024

Architecture of a Cortex Inspired Hierarchical Event Recaller

Valentín Puente Varona

117

03 May 2024

Sequence-to-sequence models in peer-to-peer learning: A practical application

Robert Šajina

Ivo Ipšić

176

02 May 2024

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

148

31 Mar 2024

VidLA: Video-Language Alignment at ScaleComputer Vision and Pattern Recognition (CVPR), 2024

Mamshad Nayeem Rizve

Fan Fei

Jayakrishnan Unnikrishnan

Mubarak Shah

224

21 Mar 2024

Speech Robust Bench: A Robustness Benchmark For Speech RecognitionInternational Conference on Learning Representations (ICLR), 2024

236

08 Mar 2024

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

331

25 Feb 2024

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

...

543

201

19 Feb 2024

Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints

B. Subramanian

Rathinaraja Jeyaraj

Akhrorjon Akhmadjon Ugli Rakhmonov

127

14 Feb 2024

Syllable based DNN-HMM Cantonese Speech to Text System

13 Feb 2024

Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

Tom Sander

Maxime Sylvestre

Alain Durmus

176

13 Feb 2024

EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation

Chi-Man Pun

158

02 Feb 2024

AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents

196

02 Feb 2024

Importance-Aware Adaptive Dataset DistillationNeural Networks (NN), 2024

Guang Li

Ren Togo

Takahiro Ogawa

Miki Haseyama

339

29 Jan 2024

SeMaScore : a new evaluation metric for automatic speech recognition tasksInterspeech (Interspeech), 2024

Zitha Sasindran

Harsha Yelchuri

T. V. Prabhakar

118

15 Jan 2024

$Towards End-to-End Structure Solutions from Information-Compromised Diffraction Data via Generative Deep Learning$

Towards End-to-End Structure Solutions from Information-Compromised Diffraction Data via Generative Deep Learning

120

23 Dec 2023

Real-time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI

Kai Huang

Wei Gao

190

21 Dec 2023

ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection

Kai Huang

Boyuan Yang

Wei Gao

243

21 Dec 2023

Efficiency-oriented approaches for self-supervised speech representation learning

Luis Lugo

Valentin Vielzeuf

SSL

257

18 Dec 2023

Assessing SATNet's Ability to Solve the Symbol Grounding ProblemNeural Information Processing Systems (NeurIPS), 2023

187

13 Dec 2023

Keyword spotting -- Detecting commands in speech using deep learning

Sumedha Rai

Tong Li

Bella Lyu

174

09 Dec 2023

Relational Deep Learning: Graph Representation Learning on Relational Databases

190

07 Dec 2023

MyPortrait: Morphable Prior-Guided Personalized Portrait Generation

171

05 Dec 2023

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech RepresentationComputer Vision and Pattern Recognition (CVPR), 2023

363

05 Dec 2023

SyncTalk: The Devil is in the Synchronization for Talking Head SynthesisComputer Vision and Pattern Recognition (CVPR), 2023

Hao Zhao

Jun He

Hongyan Liu

Zhaoxin Fan

266

29 Nov 2023

Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method

M. Shahin

Julien Epps

Beena Ahmed

122

13 Nov 2023

A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition

Andrei Barcovschi

Rishabh Jain

Peter Corcoran

159

07 Nov 2023

Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN

174

27 Oct 2023

SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Khanh-Binh Nguyen

275

24 Oct 2023