Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1512.02595
Cited By
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
8 December 2015
Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
Bryan Catanzaro
Jingdong Chen
Mike Chrzanowski
Adam Coates
G. Diamos
Erich Elsen
Jesse Engel
Linxi Fan
Christopher Fougner
T. Han
Awni Y. Hannun
Billy Jun
P. LeGresley
Libby Lin
Sharan Narang
A. Ng
Sherjil Ozair
R. Prenger
Jonathan Raiman
S. Satheesh
David Seetapun
Shubho Sengupta
Yi Wang
Zhiqian Wang
Chong-Jun Wang
Bo Xiao
Dani Yogatama
J. Zhan
Zhenyao Zhu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Speech 2: End-to-End Speech Recognition in English and Mandarin"
50 / 931 papers shown
Title
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
35
116
0
19 Feb 2024
Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints
B. Subramanian
Rathinaraja Jeyaraj
Akhrorjon Akhmadjon Ugli Rakhmonov
Jeonghong Kim
15
0
0
14 Feb 2024
Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training
Tom Sander
Maxime Sylvestre
Alain Durmus
31
1
0
13 Feb 2024
EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation
Guanwen Feng
Haoran Cheng
Yunan Li
Zhiyuan Ma
Chaoneng Li
Zhihao Qian
Qiguang Miao
Chi-Man Pun
CVBM
31
2
0
02 Feb 2024
AccentFold: A Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents
A. Owodunni
Aditya Yadavalli
Chris C. Emezue
Tobi Olatunji
Clinton Mbataku
38
1
0
02 Feb 2024
Importance-Aware Adaptive Dataset Distillation
Guang Li
Ren Togo
Takahiro Ogawa
Miki Haseyama
DD
30
6
0
29 Jan 2024
SeMaScore : a new evaluation metric for automatic speech recognition tasks
Zitha Sasindran
Harsha Yelchuri
T. V. Prabhakar
31
0
0
15 Jan 2024
Towards End-to-End Structure Solutions from Information-Compromised Diffraction Data via Generative Deep Learning
Gabriel Guo
Judah Goldfeder
Ling Lan
Aniv Ray
Albert Hanming Yang
Boyuan Chen
S. Billinge
Hod Lipson
32
3
0
23 Dec 2023
Real-time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI
Kai Huang
Wei Gao
22
35
0
21 Dec 2023
ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection
Kai Huang
Boyuan Yang
Wei Gao
32
18
0
21 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
31
1
0
18 Dec 2023
Assessing SATNet's Ability to Solve the Symbol Grounding Problem
Oscar Chang
Lampros Flokas
Hod Lipson
Michael Spranger
NAI
23
17
0
13 Dec 2023
Keyword spotting -- Detecting commands in speech using deep learning
Sumedha Rai
Tong Li
Bella Lyu
14
2
0
09 Dec 2023
Relational Deep Learning: Graph Representation Learning on Relational Databases
Matthias Fey
Weihua Hu
Kexin Huang
J. E. Lenssen
Rishabh Ranjan
Joshua Robinson
Rex Ying
Jiaxuan You
J. Leskovec
GNN
48
30
0
07 Dec 2023
MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
Bo Ding
Zhenfeng Fan
Shuang Yang
Shihong Xia
71
2
0
05 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
37
12
0
05 Dec 2023
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
Ziqiao Peng
Wentao Hu
Yue Shi
Xiangyu Zhu
Xiaomei Zhang
Hao Zhao
Jun He
Hongyan Liu
Zhaoxin Fan
41
40
0
29 Nov 2023
Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method
M. Shahin
Julien Epps
Beena Ahmed
16
1
0
13 Nov 2023
A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition
Andrei Barcovschi
Rishabh Jain
Peter Corcoran
21
3
0
07 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
29
0
0
27 Oct 2023
SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning
Khanh-Binh Nguyen
18
3
0
24 Oct 2023
Debiasing, calibrating, and improving Semi-supervised Learning performance via simple Ensemble Projector
Khanh-Binh Nguyen
27
2
0
24 Oct 2023
Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
Ankitha Sudarshan
Vinay Samuel
Parth Patwa
Ibtihel Amara
Aman Chadha
24
2
0
14 Oct 2023
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
S. Radhakrishnan
Chao-Han Huck Yang
S. Khan
Rohit Kumar
N. Kiani
D. Gómez-Cabrero
Jesper N. Tegnér
38
47
0
10 Oct 2023
FedLPA: One-shot Federated Learning with Layer-Wise Posterior Aggregation
Xiang Liu
Liangxi Liu
Feiyang Ye
Yunheng Shen
Xia Li
Linshan Jiang
Jialin Li
36
4
0
30 Sep 2023
Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution
Akshat Dewan
Michal Ziemski
Henri Meylan
Lorenzo Concina
Bruno Pouliquen
13
1
0
27 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
34
12
0
15 Sep 2023
DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks
Zipeng Qi
Xulong Zhang
Ning Cheng
Jing Xiao
Jianzong Wang
24
7
0
14 Sep 2023
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection
Hanqing Guo
Guangjing Wang
Yuanda Wang
Bocheng Chen
Qiben Yan
Li Xiao
AAML
37
9
0
13 Sep 2023
Hybrid ASR for Resource-Constrained Robots: HMM - Deep Learning Fusion
Anshul Ranjan
Kaushik Jegadeesan
17
0
0
11 Sep 2023
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
Yuan Gan
Zongxin Yang
Xihang Yue
Lingyun Sun
Yezhou Yang
25
57
0
10 Sep 2023
ReliTalk: Relightable Talking Portrait Generation from a Single Video
Haonan Qiu
Zhaoxi Chen
Yuming Jiang
Hang Zhou
Xiangyu Fan
Lei Yang
Wayne Wu
Ziwei Liu
DiffM
VGen
34
10
0
05 Sep 2023
Homological Convolutional Neural Networks
Antonio Briola
Yuanrong Wang
Silvia Bartolucci
T. Aste
LMTD
33
5
0
26 Aug 2023
Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?
Seyed Morteza Nabavinejad
M. Ebrahimi
Sherief Reda
27
1
0
26 Aug 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Fangyun Wei
Yutong Chen
SLR
28
28
0
21 Aug 2023
Boosting Semi-Supervised Learning by bridging high and low-confidence predictions
Khanh-Binh Nguyen
Joon-Sung Yang
27
7
0
15 Aug 2023
Cross-Attribute Matrix Factorization Model with Shared User Embedding
Wen-Chieh Liang
Zeng Fan
Youzhi Liang
Jianguo Jia
16
2
0
14 Aug 2023
Automated Sizing and Training of Efficient Deep Autoencoders using Second Order Algorithms
Kanishka Tyagi
Chinmay Rane
M. Manry
18
1
0
11 Aug 2023
Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
Haozhe Wu
Songtao Zhou
Jia Jia
Junliang Xing
Qi Wen
Xiang Wen
CVBM
32
15
0
10 Aug 2023
Personalization of Stress Mobile Sensing using Self-Supervised Learning
Tanvir Islam
Peter Washington
24
6
0
04 Aug 2023
Mercury: An Automated Remote Side-channel Attack to Nvidia Deep Learning Accelerator
Xi-ai Yan
Xiaoxuan Lou
Guowen Xu
Han Qiu
Shangwei Guo
Chip Hong Chang
Tianwei Zhang
AAML
27
7
0
02 Aug 2023
Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time
Xinfeng Li
Chen Yan
Xuancun Lu
Zihan Zeng
Xiaoyu Ji
Wenyuan Xu
AAML
40
7
0
02 Aug 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
37
20
0
27 Jul 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
30
4
0
24 Jul 2023
TST: Time-Sparse Transducer for Automatic Speech Recognition
Xiaohui Zhang
Mangui Liang
Zhengkun Tian
Jiangyan Yi
J. Tao
14
0
0
17 Jul 2023
Ed-Fed: A generic federated learning framework with resource-aware client selection for edge devices
Zitha Sasindran
Harsha Yelchuri
T. V. Prabhakar
FedML
24
4
0
14 Jul 2023
Can Generative Large Language Models Perform ASR Error Correction?
Rao Ma
Mengjie Qian
Potsawee Manakul
Mark Gales
Kate Knill
AuLLM
KELM
27
49
0
09 Jul 2023
Personalized Prediction of Recurrent Stress Events Using Self-Supervised Learning on Multimodal Time-Series Data
Tanvir Islam
Peter Washington
14
8
0
07 Jul 2023
Boosting Norwegian Automatic Speech Recognition
Javier de la Rosa
Rolv-Arild Braaten
P. Kummervold
Freddy Wetjen
Svein Arne Brygfjeld
41
7
0
04 Jul 2023
Beyond Neural-on-Neural Approaches to Speaker Gender Protection
L. V. Bemmel
Zhuoran Liu
Nik Vaessen
Martha Larson
AAML
24
2
0
30 Jun 2023
Previous
1
2
3
4
5
...
17
18
19
Next