Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1412.5567
Cited By
v1
v2 (latest)
Deep Speech: Scaling up end-to-end speech recognition
17 December 2014
Awni Y. Hannun
Carl Case
Jared Casper
Bryan Catanzaro
G. Diamos
Erich Elsen
R. Prenger
S. Satheesh
Shubho Sengupta
Adam Coates
A. Ng
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Speech: Scaling up end-to-end speech recognition"
50 / 768 papers shown
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Liyang Chen
Zhiyong Wu
Runnan Li
Weihong Bao
Jun Ling
Xuejiao Tan
Sheng Zhao
163
10
0
09 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
195
10
0
03 Aug 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
Interspeech (Interspeech), 2023
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
188
4
0
24 Jul 2023
A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos
International Conference on Web and Social Media (ICWSM), 2023
Anand Rai
Siddharth D. Jaiswal
Animesh Mukherjee
171
5
0
20 Jul 2023
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
IEEE International Conference on Computer Vision (ICCV), 2023
Jiahe Li
Jiawei Zhang
Xiao Bai
Jun Zhou
L. Gu
3DH
235
110
0
18 Jul 2023
SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark
Jun Niu
Xiaoyan Zhu
Moxuan Zeng
Ge Zhang
Qingyang Zhao
...
Peng Liu
Yulong Shen
Xiaohong Jiang
Jianfeng Ma
Yuqing Zhang
179
6
0
12 Jul 2023
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays
R. Bhattacharjea
Nathan E. West
SSL
54
1
0
06 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
International Conference on Learning Representations (ICLR), 2023
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
305
2
0
04 Jul 2023
Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning
IEEE Transactions on Information Forensics and Security (IEEE TIFS), 2023
Hong Joo Lee
Yonghyun Ro
AAML
164
4
0
27 Jun 2023
Scaling and Resizing Symmetry in Feedforward Networks
Carlos Cardona
153
2
0
26 Jun 2023
MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phones
International Conference on AI-ML-Systems (ICA), 2023
Zitha Sasindran
Harsha Yelchuri
Pooja S B. Rao
Prabhakar Venkata Tamma
187
1
0
15 Jun 2023
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
Interspeech (Interspeech), 2023
Muhammad Umar Farooq
Thomas Hain
109
4
0
14 Jun 2023
Get More for Less in Decentralized Learning Systems
IEEE International Conference on Distributed Computing Systems (ICDCS), 2023
Akash Dhasade
Anne-Marie Kermarrec
Rafael Pires
Rishi Sharma
Milos Vujasinovic
Jeffrey Wigger
215
9
0
07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Interspeech (Interspeech), 2023
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
231
4
0
07 Jun 2023
Looking and Listening: Audio Guided Text Recognition
Wenwen Yu
Mingyu Liu
Biao Yang
Enming Zhang
Deqiang Jiang
Xing Sun
Yuliang Liu
Xiang Bai
DiffM
156
1
0
06 Jun 2023
Using Sequences of Life-events to Predict Human Lives
Nature Computational Science (Nat. Comput. Sci.), 2023
Germans Savcisens
Tina Eliassi-Rad
L. K. Hansen
L. Mortensen
Lau Lilleholt
Anna Rogers
Ingo Zettler
Sune Lehmann
AI4TS
231
72
0
05 Jun 2023
DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative Inference
IEEE Transactions on Mobile Computing (IEEE TMC), 2023
Ziyang Zhang
Yang Zhao
Huan Li
Changyao Lin
Jie Liu
286
36
0
02 Jun 2023
Encoder-decoder multimodal speaker change detection
Interspeech (Interspeech), 2023
Jee-weon Jung
Soonshin Seo
Hee-Soo Heo
Geon-min Kim
You Jin Kim
Youngki Kwon
Min-Ji Lee
Bong-Jin Lee
164
3
0
01 Jun 2023
Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication
Emin Cagatay Nakilcioglu
M. Reimann
O. John
74
6
0
01 Jun 2023
Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance System
IEEE Internet of Things Journal (IEEE IoT J.), 2023
Jiwei Guan
Lei Pan
Chen Wang
Shui Yu
Longxiang Gao
Xi Zheng
AAML
193
7
0
30 May 2023
RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
David Qiu
David Rim
Shaojin Ding
Oleg Rybakov
Yanzhang He
MQ
192
4
0
24 May 2023
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
L. Gris
R. Marcacini
Arnaldo Cândido Júnior
Edresson Casanova
A. S. Soares
S. Aluísio
237
12
0
23 May 2023
QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition Systems
USENIX Security Symposium (USENIX Security), 2023
Guangke Chen
Yedi Zhang
Zhe Zhao
Fu Song
AAML
225
21
0
23 May 2023
Study of GANs for Noisy Speech Simulation from Clean Speech
L. Maben
Zixun Guo
Chen Chen
Utkarsh Chudiwal
Chng Eng Siong
113
0
0
21 May 2023
Decision-based iterative fragile watermarking for model integrity verification
Z. Yin
Heng Yin
Hang Su
Xinpeng Zhang
Zhenzhe Gao
AAML
261
6
0
13 May 2023
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jason (Jinglun) Cai
Monica Sunkara
Xilai Li
Anshu Bhatia
Xiao Pan
S. Bodapati
345
5
0
11 May 2023
Deep Learning and Geometric Deep Learning: an introduction for mathematicians and physicists
International Journal of Geometric Methods in Modern Physics (IJGMMP) (IJGMMP), 2023
R. Fioresi
F. Zanchetta
PINN
112
5
0
09 May 2023
Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model Estimation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Nilaksh Das
Monica Sunkara
S. Bodapati
Jason (Jinglun) Cai
Devang Kulshreshtha
Jeffrey J. Farris
Katrin Kirchhoff
152
4
0
05 May 2023
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Zhenhui Ye
Jinzheng He
Ziyue Jiang
Rongjie Huang
Jia-Bin Huang
Jinglin Liu
Yixiang Ren
Xiang Yin
Zejun Ma
Zhou Zhao
CVBM
211
54
0
01 May 2023
Affective social anthropomorphic intelligent system
Multimedia tools and applications (MTA), 2023
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
118
3
0
19 Apr 2023
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction
Jiefeng Chen
Chang Jo Kim
Sayna Ebrahimi
Sercan O. Arik
S. Jha
Tomas Pfister
368
5
0
07 Apr 2023
Robustmix: Improving Robustness by Regularizing the Frequency Bias of Deep Nets
Jonas Ngnawé
Marianne Abémgnigni Njifon
Jonathan Heek
Yann N. Dauphin
OOD
110
6
0
06 Apr 2023
Style Transfer for 2D Talking Head Animation
Trong-Thang Pham
Nhat Le
Tuong Khanh Long Do
Hung Nguyen
Erman Tjiputra
Quang-Dieu Tran
A. Nguyen
270
3
0
17 Mar 2023
Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Qi Chen
Ziyang Ma
Tao Liu
Xuejiao Tan
Qu Lu
Xie Chen
K. Yu
CVBM
154
6
0
09 Mar 2023
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video
AAAI Conference on Artificial Intelligence (AAAI), 2023
Zhimeng Zhang
Zhipeng Hu
W. Deng
Changjie Fan
Tangjie Lv
Yu-qiong Ding
3DH
CVBM
252
96
0
07 Mar 2023
End-to-End Speech Recognition: A Survey
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
288
245
0
03 Mar 2023
Variational EP with Probabilistic Backpropagation for Bayesian Neural Networks
Kehinde Olobatuyi
BDL
75
0
0
02 Mar 2023
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
Mina Huh
Ruchira Ray
Corey Karnei
145
6
0
27 Feb 2023
Explanations for Automatic Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xiao-lan Wu
P. Bell
A. Rajan
188
8
0
27 Feb 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
IEEE journal of biomedical and health informatics (IEEE JBHI), 2023
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
188
15
0
27 Feb 2023
Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention
Yinan Han
Xiaolin K. Wei
Bo Li
Junjie Cao
Yunyu Lai
CVBM
155
2
0
24 Feb 2023
Evaluating Automatic Speech Recognition in an Incremental Setting
Ryan Whetten
M. Imtiaz
C. Kennington
45
2
0
23 Feb 2023
Using Semantic Information for Defining and Detecting OOD Inputs
Ramneet Kaur
Xiayan Ji
Souradeep Dutta
Michele Caprio
Yahan Yang
E. Bernardis
O. Sokolsky
Insup Lee
OODD
223
10
0
21 Feb 2023
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhong Meng
Weiran Wang
Rohit Prabhavalkar
Tara N. Sainath
Tongzhou Chen
Ehsan Variani
Yu Zhang
Yue Liu
Andrew Rosenberg
Bhuvana Ramabhadran
AuLLM
VLM
209
13
0
16 Feb 2023
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis
International Conference on Learning Representations (ICLR), 2023
Zhenhui Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Jinzheng He
Zhou Zhao
CVBM
209
181
0
31 Jan 2023
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
232
3
0
26 Jan 2023
A Data-Efficient Visual-Audio Representation with Intuitive Fine-tuning for Voice-Controlled Robots
Conference on Robot Learning (CoRL), 2023
Peixin Chang
Shuijing Liu
Tianchen Ji
Neeloy Chakraborty
Kaiwen Hong
Katherine Driggs-Campbell
188
5
0
23 Jan 2023
Neural Architecture Search: Insights from 1000 Papers
Colin White
Mahmoud Safari
R. Sukthanker
Binxin Ru
T. Elsken
Arber Zela
Debadeepta Dey
Katharina Eggensperger
3DV
AI4CE
409
192
0
20 Jan 2023
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation
Computer Vision and Pattern Recognition (CVPR), 2023
Shuai Shen
Wenliang Zhao
Zibin Meng
Wanhua Li
Zhengbiao Zhu
Jie Zhou
Jiwen Lu
DiffM
VGen
277
155
0
10 Jan 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Maxime Burchi
Radu Timofte
VLM
213
49
0
04 Jan 2023
Previous
1
2
3
4
5
6
...
14
15
16
Next