Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 750 papers shown
Title
Large scale weakly and semi-supervised learning for low-resource video ASR
Kritika Singh
Vimal Manohar
Alex Xiao
Sergey Edunov
Ross B. Girshick
Vitaliy Liptchinsky
Christian Fuegen
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
31
9
0
16 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
16
61
0
14 May 2020
Streaming keyword spotting on mobile devices
Oleg Rybakov
Natasha Kononenko
Niranjan A. Subrahmanya
Mirkó Visontai
Stella Laurenzo
AI4TS
25
109
0
14 May 2020
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Wei Han
Zhengdong Zhang
Yu Zhang
Jiahui Yu
Chung-Cheng Chiu
James Qin
Anmol Gulati
Ruoming Pang
Yonghui Wu
42
259
0
07 May 2020
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech
Jakob Drachmann Havtorn
Jan Latko
Joakim Edin
Lasse Borgholt
Lars Maaløe
Lorenzo Belgrano
Nicolai Frost Jakobsen
R. Sdun
Zeljko Agic
21
3
0
02 May 2020
Logic-Guided Data Augmentation and Regularization for Consistent Question Answering
Akari Asai
Hannaneh Hajishirzi
NAI
21
111
0
21 Apr 2020
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang
Yu Wu
Shujie Liu
Ming Zhou
Zhenglu Yang
29
108
0
21 Apr 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
19
113
0
28 Mar 2020
Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks
Majed El Helou
Ruofan Zhou
Sabine Süsstrunk
24
45
0
16 Mar 2020
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Esteban Real
Chen Liang
David R. So
Quoc V. Le
44
220
0
06 Mar 2020
Time Series Data Augmentation for Deep Learning: A Survey
Qingsong Wen
Liang Sun
Fan Yang
Xiaomin Song
Jing Gao
Xue Wang
Huan Xu
AI4TS
37
636
0
27 Feb 2020
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation
Arya D. McCarthy
Liezl Puzon
J. Pino
49
24
0
27 Feb 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan
Chitwan Saharia
Geoffrey E. Hinton
Mohammad Norouzi
Navdeep Jaitly
BDL
AI4TS
21
114
0
20 Feb 2020
Small energy masking for improved neural network training for end-to-end speech recognition
Chanwoo Kim
Kwangyoun Kim
S. Indurthi
24
8
0
15 Feb 2020
Accelerating RNN Transducer Inference via One-Step Constrained Beam Search
Juntae Kim
Yoonhan Lee
20
22
0
10 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
36
92
0
06 Feb 2020
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
Changhan Wang
J. Pino
Anne Wu
Jiatao Gu
SLR
40
82
0
04 Feb 2020
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
189
288
0
25 Jan 2020
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn
David Berthelot
Chun-Liang Li
Zizhao Zhang
Nicholas Carlini
E. D. Cubuk
Alexey Kurakin
Han Zhang
Colin Raffel
AAML
104
3,479
0
21 Jan 2020
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard
Zoltán Tüske
G. Saon
Kartik Audhkhasi
Brian Kingsbury
BDL
28
68
0
20 Jan 2020
Learning Speaker Embedding with Momentum Contrast
Ke Ding
Xuanji He
Guanglu Wan
SSL
28
10
0
07 Jan 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
18
83
0
19 Dec 2019
Data Augmentation for Deep Learning-based Radio Modulation Classification
Liang Huang
Weijian Pan
You Zhang
L. Qian
Nan Gao
Yuan Wu
36
133
0
06 Dec 2019
Distance-Based Learning from Errors for Confidence Calibration
Chen Xing
Sercan O. Arik
Zizhao Zhang
Tomas Pfister
FedML
23
39
0
03 Dec 2019
Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music
Agelos Kratimenos
Kleanthis Avramidis
C. Garoufis
Athanasia Zlatintsi
Petros Maragos
32
19
0
28 Nov 2019
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Gabriel Synnaeve
Qiantong Xu
Jacob Kahn
Tatiana Likhomanenko
Edouard Grave
Vineel Pratap
Anuroop Sriram
Vitaliy Liptchinsky
R. Collobert
SSL
AI4TS
36
246
0
19 Nov 2019
Effectiveness of self-supervised pre-training for speech recognition
Alexei Baevski
Michael Auli
Abdel-rahman Mohamed
SSL
27
147
0
10 Nov 2019
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network
R. Yazdani
Olatunji Ruwase
Minjia Zhang
Yuxiong He
J. Arnau
Antonio González
38
4
0
04 Nov 2019
Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
T. Nguyen
S. Stueker
Jan Niehues
A. Waibel
24
98
0
29 Oct 2019
Transformer-Transducer: End-to-End Speech Recognition with Self-Attention
Ching-Feng Yeh
Jay Mahadeokar
Kaustubh Kalgaonkar
Yongqiang Wang
Duc Le
Mahaveer Jain
Kjell Schubert
Christian Fuegen
M. Seltzer
27
148
0
28 Oct 2019
Learning Data Manipulation for Augmentation and Weighting
Zhiting Hu
Bowen Tan
Ruslan Salakhutdinov
Tom Michael Mitchell
Eric Xing
29
116
0
28 Oct 2019
Recognizing long-form speech using streaming end-to-end models
A. Narayanan
Rohit Prabhavalkar
Chung-Cheng Chiu
David Rybach
Tara N. Sainath
Trevor Strohman
29
129
0
24 Oct 2019
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii Hrinchuk
Mariya Popova
Boris Ginsburg
VLM
20
87
0
23 Oct 2019
A practical two-stage training strategy for multi-stream end-to-end speech recognition
Ruizhi Li
Gregory Sell
Xiaofei Wang
Shinji Watanabe
H. Hermansky
24
7
0
23 Oct 2019
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
Andros Tjandra
Chunxi Liu
Frank Zhang
Xiaohui Zhang
Yongqiang Wang
Gabriel Synnaeve
Satoshi Nakamura
Geoffrey Zweig
ViT
25
44
0
23 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
16
248
0
22 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
28
661
0
12 Oct 2019
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
Kyu Jeong Han
R. Prieto
Kaixing(Kai) Wu
T. Ma
18
69
0
01 Oct 2019
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
147
3,425
0
30 Sep 2019
GraphMix: Improved Training of GNNs for Semi-Supervised Learning
Vikas Verma
Meng Qu
Kenji Kawaguchi
Alex Lamb
Yoshua Bengio
Arno Solin
Jian Tang
33
62
0
25 Sep 2019
Large-scale representation learning from visually grounded untranscribed speech
Gabriel Ilharco
Yuan Zhang
Jason Baldridge
SSL
27
60
0
19 Sep 2019
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Yiming Wang
Tongfei Chen
Hainan Xu
Shuoyang Ding
Hang Lv
Yiwen Shao
Nanyun Peng
Lei Xie
Shinji Watanabe
Sanjeev Khudanpur
VLM
33
73
0
18 Sep 2019
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
Chengyi Wang
Yu-Huan Wu
Shujie Liu
Zhenglu Yang
M. Zhou
30
83
0
17 Sep 2019
Multilingual Graphemic Hybrid ASR with Massive Data Augmentation
Chunxi Liu
Qiaochu Zhang
Xiaohui Zhang
Kritika Singh
Yatharth Saraf
Geoffrey Zweig
29
27
0
14 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
37
716
0
13 Sep 2019
BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition
Shaoshi Ling
Julian Salazar
Yuzong Liu
Katrin Kirchhoff
SSL
33
28
0
30 Jun 2019
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Linhao Dong
Bo Xu
27
125
0
27 May 2019
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
46
171
0
10 May 2019
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Christoph Luscher
Eugen Beck
Kazuki Irie
M. Kitza
Wilfried Michel
Albert Zeyer
Ralf Schluter
Hermann Ney
VLM
13
234
0
08 May 2019
Towards Efficient Model Compression via Learned Global Ranking
Ting-Wu Chin
Ruizhou Ding
Cha Zhang
Diana Marculescu
16
170
0
28 Apr 2019
Previous
1
2
3
...
13
14
15