ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.5567
  4. Cited By
Deep Speech: Scaling up end-to-end speech recognition
v1v2 (latest)

Deep Speech: Scaling up end-to-end speech recognition

17 December 2014
Awni Y. Hannun
Carl Case
Jared Casper
Bryan Catanzaro
G. Diamos
Erich Elsen
R. Prenger
S. Satheesh
Shubho Sengupta
Adam Coates
A. Ng
ArXiv (abs)PDFHTML

Papers citing "Deep Speech: Scaling up end-to-end speech recognition"

50 / 768 papers shown
Title
Reconstructing Unseen Sentences from Speech-related Biosignals for Open-vocabulary Neural Communication
Reconstructing Unseen Sentences from Speech-related Biosignals for Open-vocabulary Neural CommunicationIEEE transactions on neural systems and rehabilitation engineering (TNSRE), 2025
Deok-Seon Kim
Seo-Hyun Lee
Kang Yin
Seong-Whan Lee
81
0
0
31 Oct 2025
Unified Implementations of Recurrent Neural Networks in Multiple Deep Learning Frameworks
Unified Implementations of Recurrent Neural Networks in Multiple Deep Learning Frameworks
Francesco Martinuzzi
AI4TS
128
0
0
24 Oct 2025
Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges
Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges
Cheng Huang
Nyima Tashi
Fan Gao
Yutong Liu
J. Li
...
Guojie Tang
Xiangxiang Wang
Jia Zhang
Tsengdar J. Lee
Yongbin Yu
96
0
0
22 Oct 2025
StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction
StutterZero and StutterFormer: End-to-End Speech Conversion for Stuttering Transcription and Correction
Qianheng Xu
100
0
0
21 Oct 2025
Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?
Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?
Oriol Pareras
Gerard I. Gállego
Federico Costa
Cristina España-Bonet
Javier Hernando
LRM
88
0
0
03 Oct 2025
Linguistic and Audio Embedding-Based Machine Learning for Alzheimer's Dementia and Mild Cognitive Impairment Detection: Insights from the PROCESS Challenge
Linguistic and Audio Embedding-Based Machine Learning for Alzheimer's Dementia and Mild Cognitive Impairment Detection: Insights from the PROCESS Challenge
Adharsha Sam Edwin Sam Devahi
Sohail Singh Sangha
Prachee Priyadarshinee
Jithin Thilakan
Ivan Fu Xing Tan
Christopher Johann Clarke
Sou Ka Lon
Balamurali B T
Yow Wei Quin
Chen Jer-Ming
76
0
0
02 Oct 2025
Impact of Phonetics on Speaker Identity in Adversarial Voice Attack
Impact of Phonetics on Speaker Identity in Adversarial Voice Attack
Daniyal Kabir Dar
Qiben Yan
Li Xiao
Arun Ross
AAML
64
0
0
18 Sep 2025
Symplectic convolutional neural networks
Symplectic convolutional neural networks
Süleyman Yıldız
Konrad Janik
P. Benner
93
0
0
27 Aug 2025
The GINN framework: a stochastic QED correspondence for stability and chaos in deep neural networks
The GINN framework: a stochastic QED correspondence for stability and chaos in deep neural networks
Rodrigo Carmo Terin
40
1
0
26 Aug 2025
AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition
AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition
Junxiao Xue
Xiaozhen Liu
Xuecheng Wu
Xinyi Yin
Danlei Huang
Fei Yu
103
0
0
11 Aug 2025
Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling
Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling
Xiaodan Chen
Xiaoxue Gao
M. Quoy
Alexandre Pitti
Nancy F.Chen
149
0
0
13 Jun 2025
SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
Wei-Ping Huang
Guan-Ting Lin
Hung-yi Lee
KELM
87
0
0
10 Jun 2025
ASRJam: Human-Friendly AI Speech Jamming to Prevent Automated Phone Scams
ASRJam: Human-Friendly AI Speech Jamming to Prevent Automated Phone Scams
Freddie Grabovski
Gilad Gressel
Yisroel Mirsky
102
0
0
10 Jun 2025
SpikeSMOKE: Spiking Neural Networks for Monocular 3D Object Detection with Cross-Scale Gated Coding
SpikeSMOKE: Spiking Neural Networks for Monocular 3D Object Detection with Cross-Scale Gated Coding
Xuemei Chen
Huamin Wang
Hangchi Shen
Shukai Duan
S. Wen
Tingwen Huang
151
0
0
09 Jun 2025
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial AnimationComputer Vision and Pattern Recognition (CVPR), 2025
Hao Li
Ju Dai
Xin Zhao
Feng Zhou
Junjun Pan
Lei Li
120
1
0
29 May 2025
OT-Talk: Animating 3D Talking Head with Optimal Transportation
OT-Talk: Animating 3D Talking Head with Optimal TransportationInternational Conference on Multimedia Retrieval (ICMR), 2025
Xinmu Wang
Xiang Gao
Xiyun Song
Heather Yu
Zongfang Lin
Liang Peng
Xianfeng Gu
318
2
0
03 May 2025
Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning
Audio-Driven Talking Face Video Generation with Joint Uncertainty LearningInternational Conference on Multimedia Retrieval (ICMR), 2025
Yifan Xie
Fei Ma
Yi Bin
Ying He
Fei Richard Yu
199
0
0
26 Apr 2025
Poem Meter Classification of Recited Arabic Poetry: Integrating High-Resource Systems for a Low-Resource Task
Poem Meter Classification of Recited Arabic Poetry: Integrating High-Resource Systems for a Low-Resource Task
Maged S. Al-Shaibani
Zaid Alyafeai
Irfan Ahmad
170
0
0
16 Apr 2025
Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms' "Typo" Correction
Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms' "Typo" CorrectionWorkshop on Offensive Technologies (WOOT), 2025
Seyyed Ali Ayati
Jin Hyun Park
Yichen Cai
Marcus Botacin
108
0
0
15 Apr 2025
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Xabier de Zuazo
Eva Navas
Ibon Saratxaga
Inma Hernáez Rioja
276
3
0
30 Mar 2025
SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved Synchronization
SyncDiff: Diffusion-based Talking Head Synthesis with Bottlenecked Temporal Visual Prior for Improved SynchronizationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Xulin Fan
Heting Gao
Ziyi Chen
Peng Chang
Mei Han
Mark Hasegawa-Johnson
DiffM
289
1
0
17 Mar 2025
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained ControlInternational Conference on Learning Representations (ICLR), 2025
Hejia Chen
Haoxian Zhang
Shoulong Zhang
Xiaoqiang Liu
Sisi Zhuang
Yuan Zhang
Pengfei Wan
Di Zhang
Shuai Li
207
8
0
14 Mar 2025
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic TextureComputer Vision and Pattern Recognition (CVPR), 2025
X. Li
Jianyu Wang
Yuhao Cheng
Yikun Zeng
X. Ren
W. Zhu
Weiming Zhao
Manwen Liao
194
2
0
01 Mar 2025
InsTaG: Learning Personalized 3D Talking Head from Few-Second Video
InsTaG: Learning Personalized 3D Talking Head from Few-Second VideoComputer Vision and Pattern Recognition (CVPR), 2025
Jiahe Li
Jiawei Zhang
Xiao Bai
Jin Zheng
J. Zhou
L. Gu
343
7
0
27 Feb 2025
Logit Disagreement: OoD Detection with Bayesian Neural Networks
Logit Disagreement: OoD Detection with Bayesian Neural Networks
Kevin Raina
UQCVBDLUDPER
371
1
0
24 Feb 2025
A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport
A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport
Yacouba Kaloga
Shashi Kumar
P. Motlícek
Ina Kodrasi
OT
327
0
0
03 Feb 2025
Safeguarding Privacy in Edge Speech Understanding with Tiny Foundation Models
Safeguarding Privacy in Edge Speech Understanding with Tiny Foundation Models
A. Benazir
Felix Xiaozhu Lin
265
2
0
29 Jan 2025
SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation
SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head AnimationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Yujian Liu
Shidang Xu
Jing Guo
Dingbin Wang
Zairan Wang
Xianfeng Tan
Xiaoli Liu
96
3
0
24 Jan 2025
On Accelerating Deep Neural Network Mutation Analysis by Neuron and Mutant Clustering
On Accelerating Deep Neural Network Mutation Analysis by Neuron and Mutant ClusteringInternational Conference on Information Control Systems & Technologies (ICICST), 2025
Lauren Lyons
Ali Ghanbari
192
1
0
22 Jan 2025
From Audio Deepfake Detection to AI-Generated Music Detection -- A
  Pathway and Overview
From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview
Yupei Li
M. Milling
Lucia Specia
Björn Schuller
332
11
0
30 Nov 2024
BanglaDialecto: An End-to-End AI-Powered Regional Speech StandardizationBigData Congress [Services Society] (BSS), 2024
Md. Nazmus Sadat Samin
Jawad Ibn Ahad
Tanjila Ahmed Medha
Fuad Rahman
M. R. Amin
Nabeel Mohammed
Shafin Rahman
204
1
0
16 Nov 2024
Exploring the Stability Gap in Continual Learning: The Role of the
  Classification Head
Exploring the Stability Gap in Continual Learning: The Role of the Classification HeadIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Wojciech Łapacz
Daniel Marczak
Filip Szatkowski
Tomasz Trzciñski
374
4
0
06 Nov 2024
RELATE: A Modern Processing Platform for Romanian Language
RELATE: A Modern Processing Platform for Romanian Language
V. Pais
Radu Ion
Andrei-Marius Avram
Maria Mitrofan
D. Tufis
VLM
84
1
0
29 Oct 2024
Interventional Speech Noise Injection for ASR Generalizable Spoken
  Language Understanding
Interventional Speech Noise Injection for ASR Generalizable Spoken Language UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yeonjoon Jung
Jaeseong Lee
Seungtaek Choi
Dohyeon Lee
Minsoo Kim
S. Hwang
96
0
0
21 Oct 2024
UniGlyph: A Seven-Segment Script for Universal Language Representation
UniGlyph: A Seven-Segment Script for Universal Language Representation
G. V. Bency Sherin
A. Abijesh Euphrine
A. Lenora Moreen
L. Arun Jose
169
0
0
11 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation PlanIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
385
9
0
09 Oct 2024
A two-stage transliteration approach to improve performance of a
  multilingual ASR
A two-stage transliteration approach to improve performance of a multilingual ASR
Rohit Kumar
112
0
0
09 Oct 2024
WeHelp: A Shared Autonomy System for Wheelchair Users
WeHelp: A Shared Autonomy System for Wheelchair Users
Abulikemu Abuduweili
Alice Wu
Tianhao Wei
Weiye Zhao
130
0
0
18 Sep 2024
3DFacePolicy: Audio-Driven 3D Facial Animation Based on Action Control
3DFacePolicy: Audio-Driven 3D Facial Animation Based on Action Control
Xuanmeng Sha
Liyun Zhang
Tomohiro Mashita
Yuki Uranishi
Yuki Uranishi
VGen
216
2
0
17 Sep 2024
Comparative Study on Noise-Augmented Training and its Effect on Adversarial Robustness in ASR Systems
Comparative Study on Noise-Augmented Training and its Effect on Adversarial Robustness in ASR SystemsComputer Speech and Language (CSL), 2024
Karla Pizzi
Matías P. Pizarro
Asja Fischer
271
1
0
03 Sep 2024
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword
  Spotting in Speech Technology
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology
Weinan Dai
Yifeng Jiang
Yuanjing Liu
Jinkun Chen
Xin Sun
Jinglei Tao
SSL
140
1
0
31 Aug 2024
Subgroup Analysis via Model-based Rule Forest
Subgroup Analysis via Model-based Rule ForestIEEE International Conference on Information Reuse and Integration (IRI), 2024
I-Ling Cheng
Chan Hsu
Chantung Ku
Pei-Ju Lee
Yihuang Kang
81
0
0
27 Aug 2024
The State of Commercial Automatic French Legal Speech Recognition
  Systems and their Impact on Court Reporters et al
The State of Commercial Automatic French Legal Speech Recognition Systems and their Impact on Court Reporters et al
Nicolad Garneau
Olivier Bolduc
ELMAILaw
136
1
0
21 Aug 2024
FourierKAN outperforms MLP on Text Classification Head Fine-tuning
FourierKAN outperforms MLP on Text Classification Head Fine-tuning
Abdullah Al Imran
Md Farhan Ishmam
VLM
192
1
0
16 Aug 2024
Content and Style Aware Audio-Driven Facial Animation
Content and Style Aware Audio-Driven Facial AnimationBritish Machine Vision Conference (BMVC), 2024
Qingju Liu
Hyeongwoo Kim
Gaurav Bharaj
DiffM
271
2
0
13 Aug 2024
Audio Enhancement for Computer Audition -- An Iterative Training
  Paradigm Using Sample Importance
Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample ImportanceJournal of Computational Science and Technology (JCST), 2024
M. Milling
Shuo Liu
Andreas Triantafyllopoulos
Ilhan Aslan
Björn W. Schuller
248
4
0
12 Aug 2024
Style-Preserving Lip Sync via Audio-Aware Style Reference
Style-Preserving Lip Sync via Audio-Aware Style Reference
Weizhi Zhong
Jichang Li
Yinqi Cai
Ming Li
Guanbin Li
Liang Lin
G. Li
253
6
0
10 Aug 2024
EmoFace: Audio-driven Emotional 3D Face Animation
EmoFace: Audio-driven Emotional 3D Face Animation
Chang Liu
Qunfen Lin
Zijiao Zeng
Ye Pan
CVBM
164
7
0
17 Jul 2024
Leveraging LLM-Respondents for Item Evaluation: a Psychometric Analysis
Leveraging LLM-Respondents for Item Evaluation: a Psychometric Analysis
Yunting Liu
Shreya Bhandari
Z. Pardos
159
23
0
15 Jul 2024
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based
  Streaming ASR
CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
Wenbo Zhao
Ziwei Li
Chuan Yu
Zhijian Ou
AI4TS
234
3
0
14 Jul 2024
1234...141516
Next