ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1512.02595
  4. Cited By
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

8 December 2015
Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
Bryan Catanzaro
Jingdong Chen
Mike Chrzanowski
Adam Coates
G. Diamos
Erich Elsen
Jesse Engel
Linxi Fan
Christopher Fougner
T. Han
Awni Y. Hannun
Billy Jun
P. LeGresley
Libby Lin
Sharan Narang
A. Ng
Sherjil Ozair
R. Prenger
Jonathan Raiman
S. Satheesh
David Seetapun
Shubho Sengupta
Yi Wang
Zhiqian Wang
Chong-Jun Wang
Bo Xiao
Dani Yogatama
J. Zhan
Zhenyao Zhu
ArXiv (abs)PDFHTML

Papers citing "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin"

50 / 1,096 papers shown
Title
Data Extrapolation for Text-to-image Generation on Small Datasets
Data Extrapolation for Text-to-image Generation on Small Datasets
Senmao Ye
Fei Liu
211
1
0
02 Oct 2024
WeHelp: A Shared Autonomy System for Wheelchair Users
WeHelp: A Shared Autonomy System for Wheelchair Users
Abulikemu Abuduweili
Alice Wu
Tianhao Wei
Weiye Zhao
130
0
0
18 Sep 2024
Open-World Test-Time Training: Self-Training with Contrast Learning
Open-World Test-Time Training: Self-Training with Contrast Learning
Houcheng Su
Mengzhu Wang
Jiao Li
Bingli Wang
Daixian Liu
Zeheng Wang
VLM
229
0
0
15 Sep 2024
ASR Error Correction using Large Language Models
ASR Error Correction using Large Language ModelsIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Rao Ma
Mengjie Qian
Mark Gales
Kate Knill
KELM
279
20
0
14 Sep 2024
The State of Commercial Automatic French Legal Speech Recognition
  Systems and their Impact on Court Reporters et al
The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al
Nicolad Garneau
Olivier Bolduc
ELMAILaw
156
1
0
21 Aug 2024
Audio Enhancement for Computer Audition -- An Iterative Training
  Paradigm Using Sample Importance
Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample ImportanceJournal of Computational Science and Technology (JCST), 2024
M. Milling
Shuo Liu
Andreas Triantafyllopoulos
Ilhan Aslan
Björn W. Schuller
260
4
0
12 Aug 2024
Digital Avatars: Framework Development and Their Evaluation
Digital Avatars: Framework Development and Their EvaluationInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Timothy Rupprecht
Sung-En Chang
Yushu Wu
Lei Lu
Enfu Nan
...
Zhimin Li
Zhijun Hu
Yumei He
David Kaeli
Yanzhi Wang
78
1
0
07 Aug 2024
DeepSpeech models show Human-like Performance and Processing of Cochlear
  Implant Inputs
DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs
Cynthia R. Steinhardt
Menoua Keshishian
N. Mesgarani
Kim Stachenfeld
145
0
0
30 Jul 2024
Text-based Talking Video Editing with Cascaded Conditional Diffusion
Text-based Talking Video Editing with Cascaded Conditional Diffusion
Bo Han
Heqing Zou
Haoyang Li
Guangcong Wang
Chng Eng Siong
VGenDiffM
224
4
0
20 Jul 2024
CBM: Curriculum by Masking
CBM: Curriculum by Masking
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
212
4
0
06 Jul 2024
Evaluating Model Performance Under Worst-case Subpopulations
Evaluating Model Performance Under Worst-case Subpopulations
Mike Li
Hongseok Namkoong
Shangzhou Xia
Shangzhou Xia
243
19
0
01 Jul 2024
Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition
  Systems
Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems
Zheng Fang
Tao Wang
Lingchen Zhao
Shenyi Zhang
Bowen Li
Yunjie Ge
Cunliang Kong
Chao Shen
Qian Wang
118
17
0
27 Jun 2024
Continuous Sign Language Recognition Using Intra-inter Gloss Attention
Continuous Sign Language Recognition Using Intra-inter Gloss Attention
Hossein Ranjbar
Alireza Taheri
SLR
172
8
0
26 Jun 2024
Token-Weighted RNN-T for Learning from Flawed Data
Token-Weighted RNN-T for Learning from Flawed Data
Gil Keren
Wei Zhou
Ozlem Kalinli
239
1
0
26 Jun 2024
Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments
Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments
Shilei Cao
Yan Liu
Lixian Zhang
Baoquan Zhao
Ziqi Yuan
Weijia Li
Runmin Dong
Haohuan Fu
TTAOOD
548
5
0
24 Jun 2024
Decoder-only Architecture for Streaming End-to-end Speech Recognition
Decoder-only Architecture for Streaming End-to-end Speech Recognition
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
RALMAuLLM
316
13
0
23 Jun 2024
Boosting Consistency in Dual Training for Long-Tailed Semi-Supervised
  Learning
Boosting Consistency in Dual Training for Long-Tailed Semi-Supervised Learning
Kai Gan
Tong Wei
Min-Ling Zhang
240
1
0
19 Jun 2024
Communication-Efficient Distributed Deep Learning via Federated Dynamic
  Averaging
Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging
Michail Theologitis
Georgios Frangias
Georgios Anestis
V. Samoladas
Antonios Deligiannakis
FedML
398
2
0
31 May 2024
OpFlowTalker: Realistic and Natural Talking Face Generation via Optical
  Flow Guidance
OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance
Shuheng Ge
Haoyu Xing
Li Zhang
Xiangqian Wu
280
0
0
23 May 2024
Contribute to balance, wire in accordance: Emergence of backpropagation from a simple, bio-plausible neuroplasticity rule
Contribute to balance, wire in accordance: Emergence of backpropagation from a simple, bio-plausible neuroplasticity rulebioRxiv (bioRxiv), 2024
Xinhao Fan
S. P. Mysore
239
1
0
23 May 2024
Towards Evaluating the Robustness of Automatic Speech Recognition
  Systems via Audio Style Transfer
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer
Weifei Jin
Yuxin Cao
Junjie Su
Qi Shen
Kai Ye
Derui Wang
Jie Hao
Ziyao Liu
AAML
223
4
0
15 May 2024
Chaos-based reinforcement learning with TD3
Chaos-based reinforcement learning with TD3Neural Networks (NN), 2024
Toshitaka Matsuki
Yusuke Sakemi
Kazuyuki Aihara
306
1
0
15 May 2024
Architecture of a Cortex Inspired Hierarchical Event Recaller
Architecture of a Cortex Inspired Hierarchical Event Recaller
Valentín Puente Varona
117
1
0
03 May 2024
Sequence-to-sequence models in peer-to-peer learning: A practical
  application
Sequence-to-sequence models in peer-to-peer learning: A practical application
Robert Šajina
Ivo Ipšić
172
0
0
02 May 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through
  Weighted Samplers and Consistency Models
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Xiang Li
Fan Bu
Ambuj Mehrish
Yingting Li
Jiale Han
Bo Cheng
Soujanya Poria
DiffM
125
9
0
31 Mar 2024
VidLA: Video-Language Alignment at Scale
VidLA: Video-Language Alignment at ScaleComputer Vision and Pattern Recognition (CVPR), 2024
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLMAI4TS
188
8
0
21 Mar 2024
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
Speech Robust Bench: A Robustness Benchmark For Speech RecognitionInternational Conference on Learning Representations (ICLR), 2024
Muhammad A. Shah
David Solans Noguero
Mikko A. Heikkilä
Nicolas Kourtellis
200
12
0
08 Mar 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
311
8
0
25 Feb 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xinyu Zhou
MLLM
493
199
0
19 Feb 2024
Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints
Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints
B. Subramanian
Rathinaraja Jeyaraj
Akhrorjon Akhmadjon Ugli Rakhmonov
105
0
0
14 Feb 2024
Syllable based DNN-HMM Cantonese Speech to Text System
Syllable based DNN-HMM Cantonese Speech to Text System
Timothy Wong
Claire Li
Sam Lam
Billy Chiu
Q. Lu
Minglei Li
D. Xiong
R. Yu
Vincent Ng
61
4
0
13 Feb 2024
Implicit Bias in Noisy-SGD: With Applications to Differentially Private
  Training
Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training
Tom Sander
Maxime Sylvestre
Alain Durmus
144
2
0
13 Feb 2024
EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face
  Generation
EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation
Guanwen Feng
Haoran Cheng
Yunan Li
Zhiyuan Ma
Chaoneng Li
Zhihao Qian
Qiguang Miao
Chi-Man Pun
CVBM
149
7
0
02 Feb 2024
AccentFold: A Journey through African Accents for Zero-Shot ASR
  Adaptation to Target Accents
AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents
A. Owodunni
Aditya Yadavalli
Chris C. Emezue
Tobi Olatunji
Clinton Mbataku
186
6
0
02 Feb 2024
Importance-Aware Adaptive Dataset Distillation
Importance-Aware Adaptive Dataset DistillationNeural Networks (NN), 2024
Guang Li
Ren Togo
Takahiro Ogawa
Miki Haseyama
DD
331
15
0
29 Jan 2024
SeMaScore : a new evaluation metric for automatic speech recognition
  tasks
SeMaScore : a new evaluation metric for automatic speech recognition tasksInterspeech (Interspeech), 2024
Zitha Sasindran
Harsha Yelchuri
T. V. Prabhakar
107
4
0
15 Jan 2024
Towards End-to-End Structure Solutions from Information-Compromised
  Diffraction Data via Generative Deep Learning
Towards End-to-End Structure Solutions from Information-Compromised Diffraction Data via Generative Deep Learning
Gabriel Guo
Judah Goldfeder
Ling Lan
Aniv Ray
Albert Hanming Yang
Boyuan Chen
S. Billinge
Hod Lipson
116
4
0
23 Dec 2023
Real-time Neural Network Inference on Extremely Weak Devices: Agile
  Offloading with Explainable AI
Real-time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI
Kai Huang
Wei Gao
170
48
0
21 Dec 2023
ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic
  Tensor Selection
ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection
Kai Huang
Boyuan Yang
Wei Gao
230
30
0
21 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
238
1
0
18 Dec 2023
Assessing SATNet's Ability to Solve the Symbol Grounding Problem
Assessing SATNet's Ability to Solve the Symbol Grounding ProblemNeural Information Processing Systems (NeurIPS), 2023
Oscar Chang
Lampros Flokas
Hod Lipson
Michael Spranger
NAI
179
24
0
13 Dec 2023
Keyword spotting -- Detecting commands in speech using deep learning
Keyword spotting -- Detecting commands in speech using deep learning
Sumedha Rai
Tong Li
Bella Lyu
172
2
0
09 Dec 2023
Relational Deep Learning: Graph Representation Learning on Relational
  Databases
Relational Deep Learning: Graph Representation Learning on Relational Databases
Matthias Fey
Weihua Hu
Kexin Huang
J. E. Lenssen
Rishabh Ranjan
Joshua Robinson
Rex Ying
Jiaxuan You
J. Leskovec
GNN
153
49
0
07 Dec 2023
MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
Bo Ding
Zhenfeng Fan
Shuang Yang
Shihong Xia
155
3
0
05 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech RepresentationComputer Vision and Pattern Recognition (CVPR), 2023
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
343
16
0
05 Dec 2023
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
SyncTalk: The Devil is in the Synchronization for Talking Head SynthesisComputer Vision and Pattern Recognition (CVPR), 2023
Ziqiao Peng
Wentao Hu
Yue Shi
Xiangyu Zhu
Xiaomei Zhang
Hao Zhao
Jun He
Hongyan Liu
Zhaoxin Fan
243
94
0
29 Nov 2023
Phonological Level wav2vec2-based Mispronunciation Detection and
  Diagnosis Method
Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method
M. Shahin
Julien Epps
Beena Ahmed
111
3
0
13 Nov 2023
A comparative analysis between Conformer-Transducer, Whisper, and
  wav2vec2 for improving the child speech recognition
A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition
Andrei Barcovschi
Rishabh Jain
Peter Corcoran
147
5
0
07 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer
  Normalization based Diffusion GAN
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
150
0
0
27 Oct 2023
SequenceMatch: Revisiting the design of weak-strong augmentations for
  Semi-supervised learning
SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Khanh-Binh Nguyen
265
11
0
24 Oct 2023
Previous
12345...202122
Next