Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1512.02595
Cited By
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
8 December 2015
Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
Bryan Catanzaro
Jingdong Chen
Mike Chrzanowski
Adam Coates
G. Diamos
Erich Elsen
Jesse Engel
Linxi Fan
Christopher Fougner
T. Han
Awni Y. Hannun
Billy Jun
P. LeGresley
Libby Lin
Sharan Narang
A. Ng
Sherjil Ozair
R. Prenger
Jonathan Raiman
S. Satheesh
David Seetapun
Shubho Sengupta
Yi Wang
Zhiqian Wang
Chong-Jun Wang
Bo Xiao
Dani Yogatama
J. Zhan
Zhenyao Zhu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Speech 2: End-to-End Speech Recognition in English and Mandarin"
50 / 1,096 papers shown
Title
Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks
L. Pepino
Pablo Riera
Juan Kamienkowski
Luciana Ferrer
84
0
0
20 Nov 2025
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
Umberto Cappellazzo
Xubo Liu
Pingchuan Ma
Stavros Petridis
Maja Pantic
AuLLM
267
0
0
10 Nov 2025
audio2chart: End to End Audio Transcription into playable Guitar Hero charts
Riccardo Tripodi
68
0
0
05 Nov 2025
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
Shayne Longpre
Sneha Kudugunta
Niklas Muennighoff
I-Hung Hsu
Isaac Caswell
Alex Pentland
Sercan O. Arik
Chen-Yu Lee
Sayna Ebrahimi
CLL
LRM
97
1
0
24 Oct 2025
Policy Transfer for Continuous-Time Reinforcement Learning: A (Rough) Differential Equation Approach
Xin Guo
Zijiu Lyu
OffRL
96
0
0
16 Oct 2025
Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning
G. Abati
J. C. V. Soares
Giulio Turrisi
Victor Barasuol
Claudio Semini
100
0
0
16 Oct 2025
Multi-Agent Design Assistant for the Simulation of Inertial Fusion Energy
Meir H. Shachar
D. Sterbentz
Harshitha Menon
C. Jekel
M. Giselle Fernández-Godino
...
Kevin Korner
Robert Rieben
D. White
William J. Schill
Jonathan L. Belof
AI4CE
155
0
0
02 Oct 2025
Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
84
0
0
01 Oct 2025
An overview of neural architectures for self-supervised audio representation learning from masked spectrograms
Sarthak Yadav
Sergios Theodoridis
Zheng-Hua Tan
Mamba
175
0
0
23 Sep 2025
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
Erica Cooper
T. Okamoto
Yamato Ohtani
Tomoki Toda
Hisashi Kawai
92
0
0
05 Sep 2025
Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio
Jeong Hun Yeo
Hyeongseop Rha
Sungjune Park
Junil Won
Y. Ro
117
0
0
28 Aug 2025
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences
Dmitrii Korzh
Dmitrii Tarasov
Artyom Iudin
Elvir Karimov
Matvey Skripkin
Nikita Kuzmin
Andrey Kuznetsov
Oleg Y. Rogov
Ivan Oseledets
126
0
0
05 Aug 2025
RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function
Yunrui Yu
Kafeng Wang
Hang Su
Jun-Jie Zhu
AAML
119
0
0
30 Jul 2025
Learning to See Inside Opaque Liquid Containers using Speckle Vibrometry
Matan Kichler
Shai Bagon
Mark Sheinin
67
0
0
28 Jul 2025
Improving Adversarial Robustness Through Adaptive Learning-Driven Multi-Teacher Knowledge Distillation
Hayat Ullah
Syed Muhammad Talha Zaidi
Arslan Munir
AAML
191
0
0
28 Jul 2025
M2DAO-Talker: Harmonizing Multi-granular Motion Decoupling and Alternating Optimization for Talking-head Generation
Kui Jiang
Shiyu Liu
Junjun Jiang
Hongxun Yao
Xiaopeng Fan
VGen
119
0
0
11 Jul 2025
Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
MoE
196
2
0
08 Jul 2025
Early Attentive Sparsification Accelerates Neural Speech Transcription
Zifei Xu
Sayeh Sharify
Hesham Mostafa
T. Webb
W. Yazar
Xin Wang
133
0
0
18 Jun 2025
Speech Recognition on TV Series with Video-guided Post-ASR Correction
Haoyuan Yang
Yue Zhang
Liqiang Jing
John H.L. Hansen
124
0
0
08 Jun 2025
Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem
Andres Fernandez
Juan Azcarreta
Cagdas Bilen
Jesus Monge Alvarez
95
0
0
30 May 2025
Dub-S2ST: Textless Speech-to-Speech Translation for Seamless Dubbing
Jeongsoo Choi
Jaehun Kim
Joon Son Chung
146
0
0
27 May 2025
TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields
Alan Arazi
Eilam Shapira
Roi Reichart
LMTD
423
1
0
23 May 2025
Poem Meter Classification of Recited Arabic Poetry: Integrating High-Resource Systems for a Low-Resource Task
Maged S. Al-Shaibani
Zaid Alyafeai
Irfan Ahmad
178
0
0
16 Apr 2025
PASE: Phoneme-Aware Speech Encoder to Improve Lip Sync Accuracy for Talking Head Synthesis
Yihuan Huang
Jiajun Liu
Yanzhen Ren
Wuyang Liu
Juhua Tang
Zongkun Sun
230
0
0
08 Apr 2025
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems
Weifei Jin
Yuxin Cao
Junjie Su
Derui Wang
Yedi Zhang
Minhui Xue
Jie Hao
Jin Song Dong
Yixian Yang
AAML
191
3
0
01 Apr 2025
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Xabier de Zuazo
Eva Navas
Ibon Saratxaga
Inma Hernáez Rioja
280
3
0
30 Mar 2025
Robust DNN Partitioning and Resource Allocation Under Uncertain Inference Time
IEEE Transactions on Mobile Computing (IEEE TMC), 2025
Zhaojun Nan
Yunchu Han
Sheng Zhou
Zhisheng Niu
239
2
0
27 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
282
7
0
26 Mar 2025
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
Aniket Abhishek Soni
117
1
0
26 Mar 2025
Deep Learning for Forensic Identification of Source
Cole Patten
Christopher Saunders
Michael Puthawala
168
0
0
26 Mar 2025
Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization
Weifei Jin
Junjie Su
Hejia Wang
Yulin Ye
Jie Hao
AAML
149
1
0
25 Mar 2025
RAG-based User Profiling for Precision Planning in Mixed-precision Over-the-Air Federated Learning
Jinsheng Yuan
Yun Tang
Weisi Guo
84
0
0
19 Mar 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
318
11
0
14 Mar 2025
Enhancing Aviation Communication Transcription: Fine-Tuning Distil-Whisper with LoRA
Shokoufeh Mirzaei
Jesse Arzate
Yukti Vijay
121
0
0
13 Mar 2025
ConjointNet: Enhancing Conjoint Analysis for Preference Prediction with Representation Learning
Yanxia Zhang
Francine Chen
Shabnam Hakimi
Totte Harinen
Alex Filipowicz
...
Nikos Aréchiga
Kalani Murakami
Kent Lyons
Charlene C. Wu
Matt Klenk
98
1
0
12 Mar 2025
Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks
International Conference on Learning Representations (ICLR), 2025
Devon Jarvis
Richard Klein
Benjamin Rosman
Andrew M. Saxe
MLT
340
2
0
08 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
710
18
0
08 Mar 2025
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
376
11
0
26 Feb 2025
When, Where and Why to Average Weights?
International Conference on Machine Learning (ICML), 2025
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
513
2
0
10 Feb 2025
Dimensions underlying the representational alignment of deep neural networks with humans
Nature Machine Intelligence (Nat. Mach. Intell.), 2024
F. Mahner
Lukas Muttenthaler
Umut Güçlü
M. Hebart
343
22
0
28 Jan 2025
HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
International Conference on Learning Representations (ICLR), 2025
Armand Foucault
Franck Mamalet
François Malgouyres
MQ
779
1
0
28 Jan 2025
Adapting Whisper for Regional Dialects: Enhancing Public Services for Vulnerable Populations in the United Kingdom
Melissa Torgbi
Andrew Clayman
Jordan J. Speight
Harish Tayyar Madabushi
140
5
0
15 Jan 2025
Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Wei Zhang
Tian-Hao Zhang
Chao Luo
Hui Zhou
Chao Yang
Xinyuan Qian
Xu-cheng Yin
98
0
0
08 Jan 2025
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling
Shao-Syuan Huang
Kuan Po Huang
Andy T. Liu
Hung-yi Lee
145
3
0
21 Dec 2024
From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview
Yupei Li
M. Milling
Lucia Specia
Björn Schuller
344
11
0
30 Nov 2024
RELATE: A Modern Processing Platform for Romanian Language
V. Pais
Radu Ion
Andrei-Marius Avram
Maria Mitrofan
D. Tufis
VLM
84
1
0
29 Oct 2024
TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering
International Conference on Pattern Recognition (ICPR), 2024
A. Habib
Kesheng Wang
Mary-Anne Hartley
Gianfranco Doretto
Donald Adjeroh
LMTD
241
1
0
17 Oct 2024
Building a Multivariate Time Series Benchmarking Datasets Inspired by Natural Language Processing (NLP)
Mohammad Asif Ibna Mustafa
Ferdinand Heinrich
AI4TS
242
0
0
14 Oct 2024
A two-stage transliteration approach to improve performance of a multilingual ASR
Rohit Kumar
112
0
0
09 Oct 2024
Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition
International Conference on Theory and Practice of Natural Computing (TPNC), 2024
Olga Iakovenko
Ivan Bondarenko
116
0
0
03 Oct 2024
1
2
3
4
...
20
21
22
Next