ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1512.02595
  4. Cited By
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

8 December 2015
Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
Bryan Catanzaro
Jingdong Chen
Mike Chrzanowski
Adam Coates
G. Diamos
Erich Elsen
Jesse Engel
Linxi Fan
Christopher Fougner
T. Han
Awni Y. Hannun
Billy Jun
P. LeGresley
Libby Lin
Sharan Narang
A. Ng
Sherjil Ozair
R. Prenger
Jonathan Raiman
S. Satheesh
David Seetapun
Shubho Sengupta
Yi Wang
Zhiqian Wang
Chong-Jun Wang
Bo Xiao
Dani Yogatama
J. Zhan
Zhenyao Zhu
ArXiv (abs)PDFHTML

Papers citing "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin"

50 / 1,096 papers shown
Title
Debiasing, calibrating, and improving Semi-supervised Learning
  performance via simple Ensemble Projector
Debiasing, calibrating, and improving Semi-supervised Learning performance via simple Ensemble ProjectorIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Khanh-Binh Nguyen
128
7
0
24 Oct 2023
Improved Contextual Recognition In Automatic Speech Recognition Systems
  By Semantic Lattice Rescoring
Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
Ankitha Sudarshan
Vinay Samuel
Parth Patwa
Ibtihel Amara
Vasu Sharma
279
3
0
14 Oct 2023
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework
  for Speech Recognition
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech RecognitionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
S. Radhakrishnan
Chao-Han Huck Yang
S. Khan
Rohit Kumar
N. Kiani
D. Gómez-Cabrero
Jesper N. Tegnér
322
77
0
10 Oct 2023
FedLPA: One-shot Federated Learning with Layer-Wise Posterior
  Aggregation
FedLPA: One-shot Federated Learning with Layer-Wise Posterior AggregationNeural Information Processing Systems (NeurIPS), 2023
Xiang Liu
Liangxi Liu
Feiyang Ye
Yunheng Shen
Xia Li
Linshan Jiang
Jialin Li
412
13
0
30 Sep 2023
Developing automatic verbatim transcripts for international multilingual
  meetings: an end-to-end solution
Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solutionMachine Translation Summit (MT Summit), 2023
Akshat Dewan
Michal Ziemski
Henri Meylan
Lorenzo Concina
Bruno Pouliquen
113
1
0
27 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using
  Automatic Labels from Whisper
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from WhisperIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
185
16
0
15 Sep 2023
DiffTalker: Co-driven audio-image diffusion for talking faces via
  intermediate landmarks
DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks
Zipeng Qi
Xulong Zhang
Ning Cheng
Jing Xiao
Jianzong Wang
187
9
0
14 Sep 2023
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via
  Split-Second Phoneme Injection
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme InjectionInternational Symposium on Recent Advances in Intrusion Detection (RAID), 2023
Hanqing Guo
Guangjing Wang
Yuanda Wang
Bocheng Chen
Qiben Yan
Li Xiao
AAML
187
12
0
13 Sep 2023
Hybrid ASR for Resource-Constrained Robots: HMM - Deep Learning Fusion
Hybrid ASR for Resource-Constrained Robots: HMM - Deep Learning Fusion
Anshul Ranjan
Kaushik Jegadeesan
53
0
0
11 Sep 2023
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
Efficient Emotional Adaptation for Audio-Driven Talking-Head GenerationIEEE International Conference on Computer Vision (ICCV), 2023
Yuan Gan
Zongxin Yang
Xihang Yue
Lingyun Sun
Yezhou Yang
199
91
0
10 Sep 2023
ReliTalk: Relightable Talking Portrait Generation from a Single Video
ReliTalk: Relightable Talking Portrait Generation from a Single VideoInternational Journal of Computer Vision (IJCV), 2023
Haonan Qiu
Zhaoxi Chen
Yuming Jiang
Hang Zhou
Xiangyu Fan
Lei Yang
Wayne Wu
Ziwei Liu
DiffMVGen
202
14
0
05 Sep 2023
Homological Convolutional Neural Networks
Homological Convolutional Neural Networks
Antonio Briola
Yuanrong Wang
Silvia Bartolucci
T. Aste
LMTD
219
7
0
26 Aug 2023
Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?
Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?
Seyed Morteza Nabavinejad
M. Ebrahimi
Sherief Reda
154
1
0
26 Aug 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Improving Continuous Sign Language Recognition with Cross-Lingual SignsIEEE International Conference on Computer Vision (ICCV), 2023
Fangyun Wei
Yutong Chen
SLR
172
38
0
21 Aug 2023
Boosting Semi-Supervised Learning by bridging high and low-confidence
  predictions
Boosting Semi-Supervised Learning by bridging high and low-confidence predictions
Khanh-Binh Nguyen
Joon-Sung Yang
187
19
0
15 Aug 2023
Cross-Attribute Matrix Factorization Model with Shared User Embedding
Cross-Attribute Matrix Factorization Model with Shared User Embedding
Wen-Chieh Liang
Zeng Fan
Youzhi Liang
Jianguo Jia
99
3
0
14 Aug 2023
Automated Sizing and Training of Efficient Deep Autoencoders using Second Order Algorithms
Kanishka Tyagi
Chinmay Rane
M. Manry
127
1
0
11 Aug 2023
Speech-Driven 3D Face Animation with Composite and Regional Facial
  Movements
Speech-Driven 3D Face Animation with Composite and Regional Facial MovementsACM Multimedia (ACM MM), 2023
Haozhe Wu
Songtao Zhou
Jia Jia
Junliang Xing
Qi Wen
Xiang Wen
CVBM
216
21
0
10 Aug 2023
Personalization of Stress Mobile Sensing using Self-Supervised Learning
Personalization of Stress Mobile Sensing using Self-Supervised Learning
Tanvir Islam
Peter Washington
114
7
0
04 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
174
10
0
03 Aug 2023
Mercury: An Automated Remote Side-channel Attack to Nvidia Deep Learning
  Accelerator
Mercury: An Automated Remote Side-channel Attack to Nvidia Deep Learning AcceleratorInternational Conference on Field-Programmable Technology (ICFPT), 2023
Xi-ai Yan
Xiaoxuan Lou
Guowen Xu
Han Qiu
Shangwei Guo
Chip Hong Chang
Tianwei Zhang
AAML
114
9
0
02 Aug 2023
Inaudible Adversarial Perturbation: Manipulating the Recognition of User
  Speech in Real Time
Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real TimeNetwork and Distributed System Security Symposium (NDSS), 2023
Xinfeng Li
Chen Yan
Xuancun Lu
Zihan Zeng
Xiaoyu Ji
Wei Dong
AAML
139
15
0
02 Aug 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
J. Marescaux
Pietro Mascagni
Nassir Navab
N. Padoy
615
44
0
27 Jul 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming
  Encoder-decoder Speech Recognition
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech RecognitionInterspeech (Interspeech), 2023
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
172
4
0
24 Jul 2023
TST: Time-Sparse Transducer for Automatic Speech Recognition
TST: Time-Sparse Transducer for Automatic Speech RecognitionCAAI International Conference on Artificial Intelligence (ICCAI), 2023
Xiaohui Zhang
Mangui Liang
Zhengkun Tian
Jiangyan Yi
Jianhua Tao
101
0
0
17 Jul 2023
Ed-Fed: A generic federated learning framework with resource-aware
  client selection for edge devices
Ed-Fed: A generic federated learning framework with resource-aware client selection for edge devicesIEEE International Joint Conference on Neural Network (IJCNN), 2023
Zitha Sasindran
Harsha Yelchuri
T. V. Prabhakar
FedML
221
5
0
14 Jul 2023
Can Generative Large Language Models Perform ASR Error Correction?
Can Generative Large Language Models Perform ASR Error Correction?
Rao Ma
Mengjie Qian
Potsawee Manakul
Mark Gales
Kate Knill
AuLLMKELM
246
73
0
09 Jul 2023
Personalized Prediction of Recurrent Stress Events Using Self-Supervised
  Learning on Multimodal Time-Series Data
Personalized Prediction of Recurrent Stress Events Using Self-Supervised Learning on Multimodal Time-Series Data
Tanvir Islam
Peter Washington
114
12
0
07 Jul 2023
Boosting Norwegian Automatic Speech Recognition
Boosting Norwegian Automatic Speech RecognitionNordic Conference of Computational Linguistics (NODALIDA), 2023
Javier de la Rosa
Rolv-Arild Braaten
P. Kummervold
Freddy Wetjen
Svein Arne Brygfjeld
187
8
0
04 Jul 2023
Beyond Neural-on-Neural Approaches to Speaker Gender Protection
Beyond Neural-on-Neural Approaches to Speaker Gender ProtectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
L. V. Bemmel
Zhuoran Liu
Nik Vaessen
Martha Larson
AAML
93
2
0
30 Jun 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
SURT 2.0: Advances in Transducer-based Multi-talker Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
298
16
0
18 Jun 2023
MobileASR: A resource-aware on-device learning framework for user voice
  personalization applications on mobile phones
MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phonesInternational Conference on AI-ML-Systems (ICA), 2023
Zitha Sasindran
Harsha Yelchuri
Pooja S B. Rao
Prabhakar Venkata Tamma
158
1
0
15 Jun 2023
DCTX-Conformer: Dynamic context carry-over for low latency unified
  streaming and non-streaming Conformer ASR
DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer ASRInterspeech (Interspeech), 2023
Goeric Huybrechts
S. Ronanki
Xilai Li
H. Nosrati
S. Bodapati
Katrin Kirchhoff
156
2
0
13 Jun 2023
What Can an Accent Identifier Learn? Probing Phonetic and Prosodic
  Information in a Wav2vec2-based Accent Identification Model
What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification ModelInterspeech (Interspeech), 2023
Mu Yang
R. Shekar
Okim Kang
John H. L. Hansen
272
24
0
10 Jun 2023
Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based
  Augmentation
Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based AugmentationInterspeech (Interspeech), 2023
Massa Baali
Ibrahim Almakky
Shady Shehata
Fakhri Karray
167
4
0
07 Jun 2023
End-to-End Learning for Stochastic Optimization: A Bayesian Perspective
End-to-End Learning for Stochastic Optimization: A Bayesian PerspectiveInternational Conference on Machine Learning (ICML), 2023
Yves Rychener
Daniel Kuhn
Tobias Sutter
OODBDL
131
12
0
07 Jun 2023
Looking and Listening: Audio Guided Text Recognition
Looking and Listening: Audio Guided Text Recognition
Wenwen Yu
Mingyu Liu
Biao Yang
Enming Zhang
Deqiang Jiang
Xing Sun
Yuliang Liu
Xiang Bai
DiffM
131
1
0
06 Jun 2023
Efficient Spoken Language Recognition via Multilabel Classification
Efficient Spoken Language Recognition via Multilabel ClassificationInterspeech (Interspeech), 2023
Oriol Nieto
Zeyu Jin
Franck Dernoncourt
Justin Salamon
93
2
0
02 Jun 2023
Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced
  Driver-Assistance System
Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance SystemIEEE Internet of Things Journal (IEEE IoT J.), 2023
Jiwei Guan
Lei Pan
Chen Wang
Shui Yu
Longxiang Gao
Xi Zheng
AAML
168
5
0
30 May 2023
Downstream Task Agnostic Speech Enhancement with Self-Supervised
  Representation Loss
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation LossInterspeech (Interspeech), 2023
Hiroshi Sato
Ryo Masumura
Tsubasa Ochiai
Marc Delcroix
Takafumi Moriya
...
Kentaro Shinayama
Saki Mizuno
Mana Ihori
Tomohiro Tanaka
Nobukatsu Hojo
171
7
0
24 May 2023
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Wav2SQL: Direct Generalizable Speech-To-SQL ParsingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Huadai Liu
Rongjie Huang
Jinzheng He
Gang Sun
Ran Shen
Xize Cheng
Zhou Zhao
198
5
0
21 May 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
DoReMi: Optimizing Data Mixtures Speeds Up Language Model PretrainingNeural Information Processing Systems (NeurIPS), 2023
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Abigail Z. Jacobs
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMeMoE
496
274
0
17 May 2023
Value Iteration Networks with Gated Summarization Module
Value Iteration Networks with Gated Summarization ModuleIEEE Access (IEEE Access), 2023
Jinyu Cai
Jialong Li
Mingyue Zhang
Kenji Tei
116
3
0
11 May 2023
Quran Recitation Recognition using End-to-End Deep Learning
Quran Recitation Recognition using End-to-End Deep Learning
Ahmad Al Harere
Khloud Al Jallad
182
13
0
10 May 2023
SoK: Pragmatic Assessment of Machine Learning for Network Intrusion
  Detection
SoK: Pragmatic Assessment of Machine Learning for Network Intrusion DetectionEuropean Symposium on Security and Privacy (Euro S&P), 2023
Giovanni Apruzzese
Pavel Laskov
J. Schneider
220
40
0
30 Apr 2023
Enhancing multilingual speech recognition in air traffic control by
  sentence-level language identification
Enhancing multilingual speech recognition in air traffic control by sentence-level language identificationApplied Acoustics (Appl. Acoust.), 2023
Peng Fan
Dongyue Guo
Jianwei Zhang
Bo Yang
Yi Lin
158
9
0
29 Apr 2023
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
AVFace: Towards Detailed Audio-Visual 4D Face ReconstructionComputer Vision and Pattern Recognition (CVPR), 2023
Aggelina Chatziagapi
Dimitris Samaras
3DHCVBM
159
5
0
25 Apr 2023
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming
  Conformer ASR
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xilai Li
Goeric Huybrechts
S. Ronanki
Jeffrey J. Farris
S. Bodapati
170
13
0
18 Apr 2023
Energy-Efficient GPU Clusters Scheduling for Deep Learning
Energy-Efficient GPU Clusters Scheduling for Deep Learning
Diandian Gu
Xintong Xie
Gang Huang
Xin Jin
Xuanzhe Liu
GNN
188
8
0
13 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for
  Noise-Robust ASR
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASRIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
Eng Siong Chng
285
17
0
11 Apr 2023
Previous
123456...202122
Next