Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1512.02595
Cited By
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
8 December 2015
Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
Bryan Catanzaro
Jingdong Chen
Mike Chrzanowski
Adam Coates
G. Diamos
Erich Elsen
Jesse Engel
Linxi Fan
Christopher Fougner
T. Han
Awni Y. Hannun
Billy Jun
P. LeGresley
Libby Lin
Sharan Narang
A. Ng
Sherjil Ozair
R. Prenger
Jonathan Raiman
S. Satheesh
David Seetapun
Shubho Sengupta
Yi Wang
Zhiqian Wang
Chong-Jun Wang
Bo Xiao
Dani Yogatama
J. Zhan
Zhenyao Zhu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Speech 2: End-to-End Speech Recognition in English and Mandarin"
50 / 1,096 papers shown
Title
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu
Hailiang Xiong
Gangqiang Yang
Zhengfeng Du
Yewen Cao
D. Shah
194
0
0
23 Mar 2023
Bayesian Pseudo-Coresets via Contrastive Divergence
Conference on Uncertainty in Artificial Intelligence (UAI), 2023
Piyush Tiwary
Kumar Shubham
V. Kashyap
Prathosh A.P.
230
4
0
20 Mar 2023
A Deep Learning System for Domain-specific Speech Recognition
Yanan Jia
66
2
0
18 Mar 2023
MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation
Haozhe Wu
Jia Jia
Junliang Xing
Hongwei Xu
Xiangyuan Wang
Jelo Wang
CVBM
145
9
0
17 Mar 2023
Speech Modeling with a Hierarchical Transformer Dynamical VAE
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xiaoyu Lin
Xiaoyu Bie
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
BDL
174
3
0
07 Mar 2023
What is Memory? A Homological Perspective
Xin Li
159
0
0
07 Mar 2023
End-to-End Speech Recognition: A Survey
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
256
239
0
03 Mar 2023
LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chunfeng Wang
Peisong Huang
Yuxiang Zou
Haoyu Zhang
Shichao Liu
Xiang Yin
Zejun Ma
90
5
0
02 Mar 2023
Defending against Adversarial Audio via Diffusion Model
International Conference on Learning Representations (ICLR), 2023
Shutong Wu
Zhenghao Hu
Ming-Yu Liu
Weili Nie
Chaowei Xiao
DiffM
186
30
0
02 Mar 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
IEEE journal of biomedical and health informatics (IEEE JBHI), 2023
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
165
13
0
27 Feb 2023
Factual Consistency Oriented Speech Recognition
Interspeech (Interspeech), 2023
Naoyuki Kanda
Takuya Yoshioka
Yang Liu
190
1
0
24 Feb 2023
Speech Privacy Leakage from Shared Gradients in Distributed Learning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhuohang Li
Jiaxin Zhang
Jian-Dong Liu
FedML
132
2
0
21 Feb 2023
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Neural Networks (Neural Netw.), 2023
Leyuan Qu
C. Weber
S. Wermter
136
12
0
20 Feb 2023
Stochastic Approximation Approaches to Group Distributionally Robust Optimization
Lijun Zhang
Peng Zhao
Zhen-Hua Zhuang
Tianbao Yang
Zhihong Zhou
258
6
0
18 Feb 2023
E2E Spoken Entity Extraction for Virtual Agents
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Karan Singla
Yeon-Jun Kim
S. Bangalore
366
1
0
16 Feb 2023
Stabilising and accelerating light gated recurrent units for automatic speech recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Adel Moumen
Titouan Parcollet
123
3
0
16 Feb 2023
Policy-Induced Self-Supervision Improves Representation Finetuning in Visual RL
Sébastien M. R. Arnold
Fei Sha
SSL
110
0
0
12 Feb 2023
Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition
International Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Ho-Lam Chung
Junan Li
Pengfei Liu1
Wai-Kim Leung
Xixin Wu
Helen Meng
225
5
0
02 Feb 2023
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
219
3
0
26 Jan 2023
MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module
Speech Synthesis Workshop (SSW), 2023
Ondvrej Plátek
Ondrej Dusek
176
2
0
17 Jan 2023
Dataset Distillation: A Comprehensive Review
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Ruonan Yu
Songhua Liu
Xinchao Wang
DD
310
166
0
17 Jan 2023
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model
Image and Vision Computing (IVC), 2023
Dan Bigioi
Shubhajit Basak
Michał Stypułkowski
Maciej Ziȩba
H. Jordan
R. Mcdonnell
Peter Corcoran
DiffM
VGen
243
41
0
10 Jan 2023
VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion
Hanbo Cai
Pengcheng Zhang
Hai Dong
Yan Xiao
Shunhui Ji
141
7
0
20 Dec 2022
AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier-Stokes Solutions
Neural Information Processing Systems (NeurIPS), 2022
F. Bonnet
Jocelyn Ahmed Mazari
Paola Cinnella
Patrick Gallinari
AI4CE
259
78
0
15 Dec 2022
Fully complex-valued deep learning model for visual perception
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Aniruddh Sikdar
Sumanth Udupa
Suresh Sundaram
191
6
0
14 Dec 2022
An Exploratory Study of AI System Risk Assessment from the Lens of Data Distribution and Uncertainty
Zhijie Wang
Yuheng Huang
Lei Ma
Haruki Yokoyama
Susumu Tokumoto
Kazuki Munakata
186
6
0
13 Dec 2022
Memories are One-to-Many Mapping Alleviators in Talking Face Generation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Anni Tang
Tianyu He
Xuejiao Tan
Jun Ling
Liang Song
CVBM
285
27
0
09 Dec 2022
Progressive Multi-Scale Self-Supervised Learning for Speech Recognition
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022
Genshun Wan
Tan Liu
Hang Chen
Jia Pan
Cong Liu
Z. Ye
SSL
133
0
0
07 Dec 2022
Learning the joint distribution of two sequences using little or no paired data
Soroosh Mariooryad
Matt Shannon
Siyuan Ma
Tom Bagby
David Kao
Daisy Stanton
Eric Battenberg
RJ Skerry-Ryan
257
3
0
06 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
International Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
936
5,569
0
06 Dec 2022
Remote estimation of geologic composition using interferometric synthetic-aperture radar in California's Central Valley
Kyongsik Yun
Kyra H Adams
J. Reager
Zhen Liu
Caitlyn Chavez
M. Turmon
Thomas Lu
45
2
0
04 Dec 2022
COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
D. Kadiyala
Saeed Rashidi
Taekyung Heo
Abhimanyu Bambhaniya
T. Krishna
Alexandros Daglis
VLM
161
10
0
30 Nov 2022
High-fidelity Facial Avatar Reconstruction from Monocular Video with Generative Priors
Computer Vision and Pattern Recognition (CVPR), 2022
Yun-Hao Bai
Yanbo Fan
Xuanxia Wang
Yong Zhang
Jingxiang Sun
Chun Yuan
Ying Shan
3DH
165
37
0
28 Nov 2022
Deep Learning Training Procedure Augmentations
Cristian Simionescu
151
1
0
25 Nov 2022
Dynamic Neural Portraits
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
M. Doukas
Stylianos Ploumpis
Stefanos Zafeiriou
3DH
126
1
0
25 Nov 2022
Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
International Journal of Computer Vision (IJCV), 2022
Jiaxiang Tang
Kaisiyuan Wang
Hang Zhou
Xiaokang Chen
Dongliang He
Tianshu Hu
Jingtuo Liu
Gang Zeng
Jingdong Wang
3DH
200
112
0
22 Nov 2022
Phonemic Adversarial Attack against Audio Recognition in Real World
Jinyang Guo
Zhendong Chen
Zixin Yin
Qinghong Yang
Xianglong Liu
AAML
120
5
0
19 Nov 2022
Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications
Zhongkai Hao
Songming Liu
Yichi Zhang
Chengyang Ying
Yao Feng
Hang Su
Jun Zhu
PINN
AI4CE
322
150
0
15 Nov 2022
FullPack: Full Vector Utilization for Sub-Byte Quantized Inference on General Purpose CPUs
Hossein Katebi
Navidreza Asadi
M. Goudarzi
MQ
122
1
0
13 Nov 2022
Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhengkun Tian
Hongyu Xiang
Min Li
Fei Lin
Ke Ding
Guanglu Wan
116
7
0
07 Nov 2022
H_eval: A new hybrid evaluation metric for automatic speech recognition tasks
Automatic Speech Recognition & Understanding (ASRU), 2022
Zitha Sasindran
Harsha Yelchuri
T. V. Prabhakar
Supreeth K. Rao
VLM
165
9
0
03 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Neural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
316
10
0
02 Nov 2022
Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Che-Yuan Liang
Xiao-Lei Zhang
BinBin Zhang
Di Wu
Shengqiang Li
Xingcheng Song
Zhendong Peng
Fuping Pan
89
11
0
02 Nov 2022
V-Cloak: Intelligibility-, Naturalness- & Timbre-Preserving Real-Time Voice Anonymization
Jiangyi Deng
Fei Teng
Yanjiao Chen
Xiaofu Chen
Zhaohui Wang
Wenyuan Xu
149
33
0
27 Oct 2022
Cover Reproducible Steganography via Deep Generative Models
IEEE Transactions on Dependable and Secure Computing (TDSC), 2022
Kejiang Chen
Hang Zhou
Yaofei Wang
Meng Li
Weiming Zhang
Neng H. Yu
DiffM
109
15
0
26 Oct 2022
Investigating self-supervised, weakly supervised and fully supervised training approaches for multi-domain automatic speech recognition: a study on Bangladeshi Bangla
Ahnaf Mozib Samin
M. Kobir
Md. Mushtaq Shahriyar Rafee
M. F. Ahmed
Mehedi Hasan
Partha Ghosh
Shafkat Kibria
M. S. Rahman
SSL
221
0
0
24 Oct 2022
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Pradip Pramanick
Chayan Sarkar
176
8
0
21 Oct 2022
Meta Input: How to Leverage Off-the-Shelf Deep Neural Networks
Minsu Kim
Youngjoon Yu
Sungjune Park
Y. Ro
OOD
112
0
0
21 Oct 2022
Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses
Spoken Language Technology Workshop (SLT), 2022
C. Li
Ngoc Thang Vu
121
2
0
20 Oct 2022
On effects of Knowledge Distillation on Transfer Learning
Sushil Thapa
108
2
0
18 Oct 2022
Previous
1
2
3
4
5
...
20
21
22
Next