ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.5567
  4. Cited By
Deep Speech: Scaling up end-to-end speech recognition
v1v2 (latest)

Deep Speech: Scaling up end-to-end speech recognition

17 December 2014
Awni Y. Hannun
Carl Case
Jared Casper
Bryan Catanzaro
G. Diamos
Erich Elsen
R. Prenger
S. Satheesh
Shubho Sengupta
Adam Coates
A. Ng
ArXiv (abs)PDFHTML

Papers citing "Deep Speech: Scaling up end-to-end speech recognition"

50 / 768 papers shown
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style
  Transfer
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Liyang Chen
Zhiyong Wu
Runnan Li
Weihong Bao
Jun Ling
Xuejiao Tan
Sheng Zhao
163
10
0
09 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
195
10
0
03 Aug 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming
  Encoder-decoder Speech Recognition
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech RecognitionInterspeech (Interspeech), 2023
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
188
4
0
24 Jul 2023
A Deep Dive into the Disparity of Word Error Rates Across Thousands of
  NPTEL MOOC Videos
A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC VideosInternational Conference on Web and Social Media (ICWSM), 2023
Anand Rai
Siddharth D. Jaiswal
Animesh Mukherjee
171
5
0
20 Jul 2023
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking
  Portrait Synthesis
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait SynthesisIEEE International Conference on Computer Vision (ICCV), 2023
Jiahe Li
Jiawei Zhang
Xiao Bai
Jun Zhou
L. Gu
3DH
235
110
0
18 Jul 2023
SoK: Comparing Different Membership Inference Attacks with a
  Comprehensive Benchmark
SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark
Jun Niu
Xiaoyan Zhu
Moxuan Zeng
Ge Zhang
Qingyang Zhao
...
Peng Liu
Yulong Shen
Xiaohong Jiang
Jianfeng Ma
Yuqing Zhang
179
6
0
12 Jul 2023
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream
  Signal Bandwidth Regression on Digital Antenna Arrays
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays
R. Bhattacharjea
Nathan E. West
SSL
54
1
0
06 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a
  General Plug-and-Play Framework
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play FrameworkInternational Conference on Learning Representations (ICLR), 2023
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
305
2
0
04 Jul 2023
Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning
Robust Proxy: Improving Adversarial Robustness by Robust Proxy LearningIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2023
Hong Joo Lee
Yonghyun Ro
AAML
164
4
0
27 Jun 2023
Scaling and Resizing Symmetry in Feedforward Networks
Scaling and Resizing Symmetry in Feedforward Networks
Carlos Cardona
153
2
0
26 Jun 2023
MobileASR: A resource-aware on-device learning framework for user voice
  personalization applications on mobile phones
MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phonesInternational Conference on AI-ML-Systems (ICA), 2023
Zitha Sasindran
Harsha Yelchuri
Pooja S B. Rao
Prabhakar Venkata Tamma
187
1
0
15 Jun 2023
Learning Cross-lingual Mappings for Data Augmentation to Improve
  Low-Resource Speech Recognition
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech RecognitionInterspeech (Interspeech), 2023
Muhammad Umar Farooq
Thomas Hain
109
4
0
14 Jun 2023
Get More for Less in Decentralized Learning Systems
Get More for Less in Decentralized Learning SystemsIEEE International Conference on Distributed Computing Systems (ICDCS), 2023
Akash Dhasade
Anne-Marie Kermarrec
Rafael Pires
Rishi Sharma
Milos Vujasinovic
Jeffrey Wigger
215
9
0
07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in
  Transducer
Text-only Domain Adaptation using Unified Speech-Text Representation in TransducerInterspeech (Interspeech), 2023
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
231
4
0
07 Jun 2023
Looking and Listening: Audio Guided Text Recognition
Looking and Listening: Audio Guided Text Recognition
Wenwen Yu
Mingyu Liu
Biao Yang
Enming Zhang
Deqiang Jiang
Xing Sun
Yuliang Liu
Xiang Bai
DiffM
156
1
0
06 Jun 2023
Using Sequences of Life-events to Predict Human Lives
Using Sequences of Life-events to Predict Human LivesNature Computational Science (Nat. Comput. Sci.), 2023
Germans Savcisens
Tina Eliassi-Rad
L. K. Hansen
L. Mortensen
Lau Lilleholt
Anna Rogers
Ingo Zettler
Sune Lehmann
AI4TS
231
72
0
05 Jun 2023
DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative
  Inference
DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative InferenceIEEE Transactions on Mobile Computing (IEEE TMC), 2023
Ziyang Zhang
Yang Zhao
Huan Li
Changyao Lin
Jie Liu
286
36
0
02 Jun 2023
Encoder-decoder multimodal speaker change detection
Encoder-decoder multimodal speaker change detectionInterspeech (Interspeech), 2023
Jee-weon Jung
Soonshin Seo
Hee-Soo Heo
Geon-min Kim
You Jin Kim
Youngki Kwon
Min-Ji Lee
Bong-Jin Lee
164
3
0
01 Jun 2023
Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication
Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication
Emin Cagatay Nakilcioglu
M. Reimann
O. John
74
6
0
01 Jun 2023
Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced
  Driver-Assistance System
Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance SystemIEEE Internet of Things Journal (IEEE IoT J.), 2023
Jiwei Guan
Lei Pan
Chen Wang
Shui Yu
Longxiang Gao
Xi Zheng
AAML
193
7
0
30 May 2023
RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models
David Qiu
David Rim
Shaojin Ding
Oleg Rybakov
Yanzhang He
MQ
192
4
0
24 May 2023
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic
  Modeling of life histories of the Museum of the Person
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
L. Gris
R. Marcacini
Arnaldo Cândido Júnior
Edresson Casanova
A. S. Soares
S. Aluísio
237
12
0
23 May 2023
QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition
  Systems
QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition SystemsUSENIX Security Symposium (USENIX Security), 2023
Guangke Chen
Yedi Zhang
Zhe Zhao
Fu Song
AAML
225
21
0
23 May 2023
Study of GANs for Noisy Speech Simulation from Clean Speech
Study of GANs for Noisy Speech Simulation from Clean Speech
L. Maben
Zixun Guo
Chen Chen
Utkarsh Chudiwal
Chng Eng Siong
113
0
0
21 May 2023
Decision-based iterative fragile watermarking for model integrity
  verification
Decision-based iterative fragile watermarking for model integrity verification
Z. Yin
Heng Yin
Hang Su
Xinpeng Zhang
Zhenzhe Gao
AAML
261
6
0
13 May 2023
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Masked Audio Text Encoders are Effective Multi-Modal RescorersAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jason (Jinglun) Cai
Monica Sunkara
Xilai Li
Anshu Bhatia
Xiao Pan
S. Bodapati
345
5
0
11 May 2023
Deep Learning and Geometric Deep Learning: an introduction for
  mathematicians and physicists
Deep Learning and Geometric Deep Learning: an introduction for mathematicians and physicistsInternational Journal of Geometric Methods in Modern Physics (IJGMMP) (IJGMMP), 2023
R. Fioresi
F. Zanchetta
PINN
112
5
0
09 May 2023
Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR
  with Internal Language Model Estimation
Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR with Internal Language Model EstimationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Nilaksh Das
Monica Sunkara
S. Bodapati
Jason (Jinglun) Cai
Devang Kulshreshtha
Jeffrey J. Farris
Katrin Kirchhoff
152
4
0
05 May 2023
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking
  Face Generation
GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation
Zhenhui Ye
Jinzheng He
Ziyue Jiang
Rongjie Huang
Jia-Bin Huang
Jinglin Liu
Yixiang Ren
Xiang Yin
Zejun Ma
Zhou Zhao
CVBM
211
54
0
01 May 2023
Affective social anthropomorphic intelligent system
Affective social anthropomorphic intelligent systemMultimedia tools and applications (MTA), 2023
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
118
3
0
19 Apr 2023
ASPEST: Bridging the Gap Between Active Learning and Selective
  Prediction
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction
Jiefeng Chen
Chang Jo Kim
Sayna Ebrahimi
Sercan O. Arik
S. Jha
Tomas Pfister
368
5
0
07 Apr 2023
Robustmix: Improving Robustness by Regularizing the Frequency Bias of
  Deep Nets
Robustmix: Improving Robustness by Regularizing the Frequency Bias of Deep Nets
Jonas Ngnawé
Marianne Abémgnigni Njifon
Jonathan Heek
Yann N. Dauphin
OOD
110
6
0
06 Apr 2023
Style Transfer for 2D Talking Head Animation
Style Transfer for 2D Talking Head Animation
Trong-Thang Pham
Nhat Le
Tuong Khanh Long Do
Hung Nguyen
Erman Tjiputra
Quang-Dieu Tran
A. Nguyen
270
3
0
17 Mar 2023
Improving Few-Shot Learning for Talking Face System with TTS Data
  Augmentation
Improving Few-Shot Learning for Talking Face System with TTS Data AugmentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Qi Chen
Ziyang Ma
Tao Liu
Xuejiao Tan
Qu Lu
Xie Chen
K. Yu
CVBM
154
6
0
09 Mar 2023
DINet: Deformation Inpainting Network for Realistic Face Visually
  Dubbing on High Resolution Video
DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution VideoAAAI Conference on Artificial Intelligence (AAAI), 2023
Zhimeng Zhang
Zhipeng Hu
W. Deng
Changjie Fan
Tangjie Lv
Yu-qiong Ding
3DHCVBM
252
96
0
07 Mar 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A SurveyIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
288
245
0
03 Mar 2023
Variational EP with Probabilistic Backpropagation for Bayesian Neural
  Networks
Variational EP with Probabilistic Backpropagation for Bayesian Neural Networks
Kehinde Olobatuyi
BDL
75
0
0
02 Mar 2023
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
Mina Huh
Ruchira Ray
Corey Karnei
145
6
0
27 Feb 2023
Explanations for Automatic Speech Recognition
Explanations for Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xiao-lan Wu
P. Bell
A. Rajan
188
8
0
27 Feb 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language
  Pre-training Model
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training ModelIEEE journal of biomedical and health informatics (IEEE JBHI), 2023
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
188
15
0
27 Feb 2023
Pose-Controllable 3D Facial Animation Synthesis using Hierarchical
  Audio-Vertex Attention
Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention
Yinan Han
Xiaolin K. Wei
Bo Li
Junjie Cao
Yunyu Lai
CVBM
155
2
0
24 Feb 2023
Evaluating Automatic Speech Recognition in an Incremental Setting
Evaluating Automatic Speech Recognition in an Incremental Setting
Ryan Whetten
M. Imtiaz
C. Kennington
45
2
0
23 Feb 2023
Using Semantic Information for Defining and Detecting OOD Inputs
Using Semantic Information for Defining and Detecting OOD Inputs
Ramneet Kaur
Xiayan Ji
Souradeep Dutta
Michele Caprio
Yahan Yang
E. Bernardis
O. Sokolsky
Insup Lee
OODD
223
10
0
21 Feb 2023
JEIT: Joint End-to-End Model and Internal Language Model Training for
  Speech Recognition
JEIT: Joint End-to-End Model and Internal Language Model Training for Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhong Meng
Weiran Wang
Rohit Prabhavalkar
Tara N. Sainath
Tongzhou Chen
Ehsan Variani
Yu Zhang
Yue Liu
Andrew Rosenberg
Bhuvana Ramabhadran
AuLLMVLM
209
13
0
16 Feb 2023
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face
  Synthesis
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face SynthesisInternational Conference on Learning Representations (ICLR), 2023
Zhenhui Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Jinzheng He
Zhou Zhao
CVBM
209
181
0
31 Jan 2023
Open Problems in Applied Deep Learning
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
232
3
0
26 Jan 2023
A Data-Efficient Visual-Audio Representation with Intuitive Fine-tuning
  for Voice-Controlled Robots
A Data-Efficient Visual-Audio Representation with Intuitive Fine-tuning for Voice-Controlled RobotsConference on Robot Learning (CoRL), 2023
Peixin Chang
Shuijing Liu
Tianchen Ji
Neeloy Chakraborty
Kaiwen Hong
Katherine Driggs-Campbell
188
5
0
23 Jan 2023
Neural Architecture Search: Insights from 1000 Papers
Neural Architecture Search: Insights from 1000 Papers
Colin White
Mahmoud Safari
R. Sukthanker
Binxin Ru
T. Elsken
Arber Zela
Debadeepta Dey
Katharina Eggensperger
3DVAI4CE
409
192
0
20 Jan 2023
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven
  Portraits Animation
DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits AnimationComputer Vision and Pattern Recognition (CVPR), 2023
Shuai Shen
Wenliang Zhao
Zibin Meng
Wanhua Li
Zhengbiao Zhu
Jie Zhou
Jiwen Lu
DiffMVGen
277
155
0
10 Jan 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition
Audio-Visual Efficient Conformer for Robust Speech RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Maxime Burchi
Radu Timofte
VLM
213
49
0
04 Jan 2023
Previous
123456...141516
Next