ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.5567
  4. Cited By
Deep Speech: Scaling up end-to-end speech recognition
v1v2 (latest)

Deep Speech: Scaling up end-to-end speech recognition

17 December 2014
Awni Y. Hannun
Carl Case
Jared Casper
Bryan Catanzaro
G. Diamos
Erich Elsen
R. Prenger
S. Satheesh
Shubho Sengupta
Adam Coates
A. Ng
ArXiv (abs)PDFHTML

Papers citing "Deep Speech: Scaling up end-to-end speech recognition"

50 / 768 papers shown
Title
Exploring State Space and Reasoning by Elimination in Tsetlin Machines
Exploring State Space and Reasoning by Elimination in Tsetlin Machines
Ahmed K. Kadhim
Ole-Christoffer Granmo
Lei Jiao
Rishad Shafik
245
3
0
12 Jul 2024
Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition
  Systems
Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems
Zheng Fang
Tao Wang
Lingchen Zhao
Shenyi Zhang
Bowen Li
Yunjie Ge
Cunliang Kong
Chao Shen
Qian Wang
114
17
0
27 Jun 2024
NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head
  Generation
NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation
Niu Guanchen
3DH
249
0
0
17 Jun 2024
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Eungbeom Kim
Hantae Kim
Kyogu Lee
166
2
0
12 Jun 2024
Embedded Distributed Inference of Deep Neural Networks: A Systematic
  Review
Embedded Distributed Inference of Deep Neural Networks: A Systematic Review
Federico Nicolás Peccia
Oliver Bringmann
238
2
0
06 May 2024
Deep Learning Models in Speech Recognition: Measuring GPU Energy
  Consumption, Impact of Noise and Model Quantization for Edge Deployment
Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment
Aditya Chakravarty
159
2
0
02 May 2024
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via
  Gaussian Splatting
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Jiahe Li
Jiawei Zhang
Xiao Bai
Jin Zheng
Xin Ning
Jun Zhou
Lin Gu
3DGS
345
53
0
23 Apr 2024
Towards Fast Setup and High Throughput of GPU Serverless Computing
Towards Fast Setup and High Throughput of GPU Serverless Computing
Han Zhao
Weihao Cui
Quan Chen
Shulai Zhang
Zijun Li
Jingwen Leng
Chao Li
Deze Zeng
Minyi Guo
131
7
0
23 Apr 2024
Effective internal language model training and fusion for factorized
  transducer model
Effective internal language model training and fusion for factorized transducer modelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jinxi Guo
Niko Moritz
Yingyi Ma
Frank Seide
Chunyang Wu
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
Michael Seltzer
189
4
0
02 Apr 2024
PID Control-Based Self-Healing to Improve the Robustness of Large
  Language Models
PID Control-Based Self-Healing to Improve the Robustness of Large Language Models
Zhuotong Chen
Zihu Wang
Yifan Yang
Qianxiao Li
Zheng Zhang
AAML
218
3
0
31 Mar 2024
FastPerson: Enhancing Video Learning through Effective Video
  Summarization that Preserves Linguistic and Visual Contexts
FastPerson: Enhancing Video Learning through Effective Video Summarization that Preserves Linguistic and Visual Contexts
Kazuki Kawamura
Jun Rekimoto
112
7
0
26 Mar 2024
Not Just Change the Labels, Learn the Features: Watermarking Deep Neural
  Networks with Multi-View Data
Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
Yuxuan Li
Sarthak Kumar Maharana
Yunhui Guo
AAML
267
1
0
15 Mar 2024
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech
  Recognition Evaluation
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition EvaluationComputer Speech and Language (CSL), 2024
Jiayu Du
Jinpeng Li
Guoguo Chen
Wei-Qiang Zhang
ELM
162
4
0
13 Mar 2024
A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
Tyler Benster
G. Wilson
Reshef Elisha
Francis R. Willett
S. Druckmann
167
13
0
02 Mar 2024
Speaker-Independent Dysarthria Severity Classification using
  Self-Supervised Transformers and Multi-Task Learning
Speaker-Independent Dysarthria Severity Classification using Self-Supervised Transformers and Multi-Task Learning
Lauren Stumpf
B. Kadirvelu
Sigourney Waibel
A. A. Faisal
140
4
0
29 Feb 2024
Representing Online Handwriting for Recognition in Large Vision-Language
  Models
Representing Online Handwriting for Recognition in Large Vision-Language Models
Anastasiia Fadeeva
Philippe Schlattner
Andrii Maksai
Mark Collier
Efi Kokiopoulou
Jesse Berent
C. Musat
267
7
0
23 Feb 2024
The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese
The Balancing Act: Unmasking and Alleviating ASR Biases in Portuguese
Ajinkya Kulkarni
Anna Tokareva
Rameez Qureshi
Miguel Couceiro
90
8
0
12 Feb 2024
Arabic Synonym BERT-based Adversarial Examples for Text Classification
Arabic Synonym BERT-based Adversarial Examples for Text ClassificationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
Norah M. Alshahrani
Saied Alshahrani
Esma Wali
Jeanna Neefe Matthews
AAML
149
11
0
05 Feb 2024
Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording
  Privilege
Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording PrivilegeIEEE Transactions on Dependable and Secure Computing (IEEE TDSC), 2024
Peng Huang
Yao Wei
Jun Zhou
Zhongjie Ba
Liwang Lu
Feng Lin
Yang Wang
Kui Ren
170
1
0
28 Jan 2024
NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for
  Talking Face Synthesis
NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Chongke Bi
Xiaoxing Liu
Zhilei Liu
DiffMCVBM
135
9
0
23 Jan 2024
A unified multichannel far-field speech recognition system: combining
  neural beamforming with attention based end-to-end model
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
Dongdi Zhao
Jianbo Ma
Lu Lu
Jinke Li
Xuan Ji
Lei Zhu
Fuming Fang
Ming-Yuan Liu
Feijun Jiang
98
1
0
05 Jan 2024
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for
  Automatic Speech Recognition
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech RecognitionACM Multimedia Asia (MA), 2023
Chengxi Lei
Satwinder Singh
Feng Hou
Xiaoyun Jia
Ruili Wang
125
1
0
13 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
433
13
0
13 Dec 2023
Relational Deep Learning: Graph Representation Learning on Relational
  Databases
Relational Deep Learning: Graph Representation Learning on Relational Databases
Matthias Fey
Weihua Hu
Kexin Huang
J. E. Lenssen
Rishabh Ranjan
Joshua Robinson
Rex Ying
Jiaxuan You
J. Leskovec
GNN
153
49
0
07 Dec 2023
MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
Bo Ding
Zhenfeng Fan
Shuang Yang
Shihong Xia
155
3
0
05 Dec 2023
3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing
3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing
Balamurugan Thambiraja
S. Aliakbarian
Darren Cosker
Justus Thies
DiffMVGen
264
17
0
01 Dec 2023
MemoryCompanion: A Smart Healthcare Solution to Empower Efficient
  Alzheimer's Care Via Unleashing Generative AI
MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI
Lifei Zheng
Yeonie Heo
Yi Fang
AI4MH
75
1
0
20 Nov 2023
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking
  Embedding
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding
Jianzong Wang
Yimin Deng
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
CVBM
144
2
0
15 Nov 2023
Automatic Disfluency Detection from Untranscribed Speech
Automatic Disfluency Detection from Untranscribed SpeechIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Amrit Romana
K. Koishida
E. Provost
217
16
0
01 Nov 2023
Form follows Function: Text-to-Text Conditional Graph Generation based
  on Functional Requirements
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements
Peter Zachares
Vahan Hovhannisyan
Alan Mosca
Yarin Gal
216
1
0
01 Nov 2023
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio
  Forensics
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics
Valerio Francesco Puglisi
O. Giudice
Sebastiano Battiato
168
1
0
29 Oct 2023
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with
  Style Control
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control
Elif Bozkurt
179
1
0
25 Oct 2023
LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks
  with TTFS Coding
LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS CodingIEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2023
Qu Yang
Malu Zhang
Jibin Wu
Kay Chen Tan
Haizhou Li
193
15
0
23 Oct 2023
No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech
  Recognition through Pitch Manipulation
No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch ManipulationAutomatic Speech Recognition & Understanding (ASRU), 2023
Dennis Fucci
Marco Gaido
Matteo Negri
Mauro Cettolo
L. Bentivogli
181
8
0
10 Oct 2023
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose
  Generation via Diffusion Models
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion ModelsACM Transactions on Graphics (TOG), 2023
Zhiyao Sun
Tian Lv
Sheng Ye
Matthieu Lin
Jenny Sheng
Yuhui Wen
Minjing Yu
Yong Liu
DiffM
348
86
0
30 Sep 2023
Emotional Listener Portrait: Neural Listener Head Generation with
  Emotion
Emotional Listener Portrait: Neural Listener Head Generation with EmotionIEEE International Conference on Computer Vision (ICCV), 2023
Luchuan Song
Guojun Yin
Zhenchao Jin
Xiaoyi Dong
Chenliang Xu
381
18
0
29 Sep 2023
Developing automatic verbatim transcripts for international multilingual
  meetings: an end-to-end solution
Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solutionMachine Translation Summit (MT Summit), 2023
Akshat Dewan
Michal Ziemski
Henri Meylan
Lorenzo Concina
Bruno Pouliquen
129
1
0
27 Sep 2023
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio
  -- A Survey
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey
Yuchen Liu
Apu Kapadia
Donald Williamson
AAML
209
1
0
26 Sep 2023
Deepfake audio as a data augmentation technique for training automatic
  speech to text transcription models
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
Alexandre R. Ferreira
Cláudio E. C. Campelo
102
1
0
22 Sep 2023
A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network
  Speech Enhancement
A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech EnhancementIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Bengt J. Borgström
M. Brandstein
162
4
0
21 Sep 2023
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack
  on Speech Recognition
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition
Mohamad Fakih
R. Kanj
Fadi J. Kurdahi
M. Fouda
AAML
133
0
0
20 Sep 2023
FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using
  Diffusion
FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using DiffusionMotion in Games (MiG), 2023
Stefan Stan
Kazi Injamamul Haque
Zerrin Yumak
DiffM
304
87
0
20 Sep 2023
Uncertainty Estimation in Instance Segmentation with Star-convex Shapes
Uncertainty Estimation in Instance Segmentation with Star-convex ShapesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Qasim M. K. Siddiqui
Sebastian Starke
Peter Steinbach
UQCV
155
2
0
19 Sep 2023
Decoder-only Architecture for Speech Recognition with CTC Prompts and
  Text Data Augmentation
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
VLMAuLLMRALM
214
11
0
16 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using
  Automatic Labels from Whisper
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from WhisperIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
221
16
0
15 Sep 2023
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via
  Split-Second Phoneme Injection
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme InjectionInternational Symposium on Recent Advances in Intrusion Detection (RAID), 2023
Hanqing Guo
Guangjing Wang
Yuanda Wang
Bocheng Chen
Qiben Yan
Li Xiao
AAML
187
13
0
13 Sep 2023
DAD++: Improved Data-free Test Time Adversarial Defense
DAD++: Improved Data-free Test Time Adversarial Defense
Gaurav Kumar Nayak
Inder Khatri
Shubham Randive
Ruchit Rawal
Anirban Chakraborty
AAML
237
2
0
10 Sep 2023
Audio-Driven Dubbing for User Generated Contents via Style-Aware
  Semi-Parametric Synthesis
Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis
Linsen Song
Wayne Wu
Chaoyou Fu
Chen Change Loy
Xiao-Yu Zhang
222
15
0
31 Aug 2023
ASTER: Automatic Speech Recognition System Accessibility Testing for
  Stutterers
ASTER: Automatic Speech Recognition System Accessibility Testing for StutterersInternational Conference on Automated Software Engineering (ASE), 2023
Yi Liu
Yuekang Li
Gelei Deng
Felix Juefei Xu
Yao Du
Cen Zhang
Chengwei Liu
Yeting Li
Lei Ma
Yang Liu
131
6
0
30 Aug 2023
Compensating Removed Frequency Components: Thwarting Voice Spectrum
  Reduction Attacks
Compensating Removed Frequency Components: Thwarting Voice Spectrum Reduction AttacksNetwork and Distributed System Security Symposium (NDSS), 2023
Shu Wang
Kun Sun
Qi Li
AAML
150
0
0
18 Aug 2023
Previous
12345...141516
Next