ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 750 papers shown
Title
Large scale weakly and semi-supervised learning for low-resource video
  ASR
Large scale weakly and semi-supervised learning for low-resource video ASR
Kritika Singh
Vimal Manohar
Alex Xiao
Sergey Edunov
Ross B. Girshick
Vitaliy Liptchinsky
Christian Fuegen
Yatharth Saraf
Geoffrey Zweig
Abdel-rahman Mohamed
31
9
0
16 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
16
61
0
14 May 2020
Streaming keyword spotting on mobile devices
Streaming keyword spotting on mobile devices
Oleg Rybakov
Natasha Kononenko
Niranjan A. Subrahmanya
Mirkó Visontai
Stella Laurenzo
AI4TS
25
109
0
14 May 2020
ContextNet: Improving Convolutional Neural Networks for Automatic Speech
  Recognition with Global Context
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Wei Han
Zhengdong Zhang
Yu Zhang
Jiahui Yu
Chung-Cheng Chiu
James Qin
Anmol Gulati
Ruoming Pang
Yonghui Wu
42
259
0
07 May 2020
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech
MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech
Jakob Drachmann Havtorn
Jan Latko
Joakim Edin
Lasse Borgholt
Lars Maaløe
Lorenzo Belgrano
Nicolai Frost Jakobsen
R. Sdun
Zeljko Agic
21
3
0
02 May 2020
Logic-Guided Data Augmentation and Regularization for Consistent
  Question Answering
Logic-Guided Data Augmentation and Regularization for Consistent Question Answering
Akari Asai
Hannaneh Hajishirzi
NAI
21
111
0
21 Apr 2020
Curriculum Pre-training for End-to-End Speech Translation
Curriculum Pre-training for End-to-End Speech Translation
Chengyi Wang
Yu Wu
Shujie Liu
Ming Zhou
Zhenglu Yang
29
108
0
21 Apr 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
19
113
0
28 Mar 2020
Stochastic Frequency Masking to Improve Super-Resolution and Denoising
  Networks
Stochastic Frequency Masking to Improve Super-Resolution and Denoising Networks
Majed El Helou
Ruofan Zhou
Sabine Süsstrunk
24
45
0
16 Mar 2020
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Esteban Real
Chen Liang
David R. So
Quoc V. Le
44
220
0
06 Mar 2020
Time Series Data Augmentation for Deep Learning: A Survey
Time Series Data Augmentation for Deep Learning: A Survey
Qingsong Wen
Liang Sun
Fan Yang
Xiaomin Song
Jing Gao
Xue Wang
Huan Xu
AI4TS
37
636
0
27 Feb 2020
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech
  Translation
SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation
Arya D. McCarthy
Liezl Puzon
J. Pino
49
24
0
27 Feb 2020
Imputer: Sequence Modelling via Imputation and Dynamic Programming
Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan
Chitwan Saharia
Geoffrey E. Hinton
Mohammad Norouzi
Navdeep Jaitly
BDL
AI4TS
21
114
0
20 Feb 2020
Small energy masking for improved neural network training for end-to-end
  speech recognition
Small energy masking for improved neural network training for end-to-end speech recognition
Chanwoo Kim
Kwangyoun Kim
S. Indurthi
24
8
0
15 Feb 2020
Accelerating RNN Transducer Inference via One-Step Constrained Beam
  Search
Accelerating RNN Transducer Inference via One-Step Constrained Beam Search
Juntae Kim
Yoonhan Lee
20
22
0
10 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
36
92
0
06 Feb 2020
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
Changhan Wang
J. Pino
Anne Wu
Jiatao Gu
SLR
40
82
0
04 Feb 2020
Multi-task self-supervised learning for Robust Speech Recognition
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
189
288
0
25 Jan 2020
FixMatch: Simplifying Semi-Supervised Learning with Consistency and
  Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn
David Berthelot
Chun-Liang Li
Zizhao Zhang
Nicholas Carlini
E. D. Cubuk
Alexey Kurakin
Han Zhang
Colin Raffel
AAML
104
3,479
0
21 Jan 2020
Single headed attention based sequence-to-sequence model for
  state-of-the-art results on Switchboard
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard
Zoltán Tüske
G. Saon
Kartik Audhkhasi
Brian Kingsbury
BDL
28
68
0
20 Jan 2020
Learning Speaker Embedding with Momentum Contrast
Learning Speaker Embedding with Momentum Contrast
Ke Ding
Xuanji He
Guanglu Wan
SSL
28
10
0
07 Jan 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition
  Systems
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
18
83
0
19 Dec 2019
Data Augmentation for Deep Learning-based Radio Modulation
  Classification
Data Augmentation for Deep Learning-based Radio Modulation Classification
Liang Huang
Weijian Pan
You Zhang
L. Qian
Nan Gao
Yuan Wu
36
133
0
06 Dec 2019
Distance-Based Learning from Errors for Confidence Calibration
Distance-Based Learning from Errors for Confidence Calibration
Chen Xing
Sercan O. Arik
Zizhao Zhang
Tomas Pfister
FedML
23
39
0
03 Dec 2019
Augmentation Methods on Monophonic Audio for Instrument Classification
  in Polyphonic Music
Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music
Agelos Kratimenos
Kleanthis Avramidis
C. Garoufis
Athanasia Zlatintsi
Petros Maragos
32
19
0
28 Nov 2019
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern
  Architectures
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Gabriel Synnaeve
Qiantong Xu
Jacob Kahn
Tatiana Likhomanenko
Edouard Grave
Vineel Pratap
Anuroop Sriram
Vitaliy Liptchinsky
R. Collobert
SSL
AI4TS
36
246
0
19 Nov 2019
Effectiveness of self-supervised pre-training for speech recognition
Effectiveness of self-supervised pre-training for speech recognition
Alexei Baevski
Michael Auli
Abdel-rahman Mohamed
SSL
27
147
0
10 Nov 2019
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural
  Network
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network
R. Yazdani
Olatunji Ruwase
Minjia Zhang
Yuxiong He
J. Arnau
Antonio González
38
4
0
04 Nov 2019
Improving sequence-to-sequence speech recognition training with
  on-the-fly data augmentation
Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
T. Nguyen
S. Stueker
Jan Niehues
A. Waibel
24
98
0
29 Oct 2019
Transformer-Transducer: End-to-End Speech Recognition with
  Self-Attention
Transformer-Transducer: End-to-End Speech Recognition with Self-Attention
Ching-Feng Yeh
Jay Mahadeokar
Kaustubh Kalgaonkar
Yongqiang Wang
Duc Le
Mahaveer Jain
Kjell Schubert
Christian Fuegen
M. Seltzer
27
148
0
28 Oct 2019
Learning Data Manipulation for Augmentation and Weighting
Learning Data Manipulation for Augmentation and Weighting
Zhiting Hu
Bowen Tan
Ruslan Salakhutdinov
Tom Michael Mitchell
Eric Xing
29
116
0
28 Oct 2019
Recognizing long-form speech using streaming end-to-end models
Recognizing long-form speech using streaming end-to-end models
A. Narayanan
Rohit Prabhavalkar
Chung-Cheng Chiu
David Rybach
Tara N. Sainath
Trevor Strohman
29
129
0
24 Oct 2019
Correction of Automatic Speech Recognition with Transformer
  Sequence-to-sequence Model
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii Hrinchuk
Mariya Popova
Boris Ginsburg
VLM
20
87
0
23 Oct 2019
A practical two-stage training strategy for multi-stream end-to-end
  speech recognition
A practical two-stage training strategy for multi-stream end-to-end speech recognition
Ruizhi Li
Gregory Sell
Xiaofei Wang
Shinji Watanabe
H. Hermansky
24
7
0
23 Oct 2019
Deja-vu: Double Feature Presentation and Iterated Loss in Deep
  Transformer Networks
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
Andros Tjandra
Chunxi Liu
Frank Zhang
Xiaohui Zhang
Yongqiang Wang
Gabriel Synnaeve
Satoshi Nakamura
Geoffrey Zweig
ViT
25
44
0
23 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
16
248
0
22 Oct 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
28
661
0
12 Oct 2019
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention
  With Dilated 1D Convolutions
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
Kyu Jeong Han
R. Prieto
Kaixing(Kai) Wu
T. Ma
18
69
0
01 Oct 2019
RandAugment: Practical automated data augmentation with a reduced search
  space
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
147
3,425
0
30 Sep 2019
GraphMix: Improved Training of GNNs for Semi-Supervised Learning
GraphMix: Improved Training of GNNs for Semi-Supervised Learning
Vikas Verma
Meng Qu
Kenji Kawaguchi
Alex Lamb
Yoshua Bengio
Arno Solin
Jian Tang
33
62
0
25 Sep 2019
Large-scale representation learning from visually grounded untranscribed
  speech
Large-scale representation learning from visually grounded untranscribed speech
Gabriel Ilharco
Yuan Zhang
Jason Baldridge
SSL
27
60
0
19 Sep 2019
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Yiming Wang
Tongfei Chen
Hainan Xu
Shuoyang Ding
Hang Lv
Yiwen Shao
Nanyun Peng
Lei Xie
Shinji Watanabe
Sanjeev Khudanpur
VLM
33
73
0
18 Sep 2019
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End
  Speech Translation
Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
Chengyi Wang
Yu-Huan Wu
Shujie Liu
Zhenglu Yang
M. Zhou
30
83
0
17 Sep 2019
Multilingual Graphemic Hybrid ASR with Massive Data Augmentation
Multilingual Graphemic Hybrid ASR with Massive Data Augmentation
Chunxi Liu
Qiaochu Zhang
Xiaohui Zhang
Kritika Singh
Yatharth Saraf
Geoffrey Zweig
29
27
0
14 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
37
716
0
13 Sep 2019
BERTphone: Phonetically-Aware Encoder Representations for
  Utterance-Level Speaker and Language Recognition
BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition
Shaoshi Ling
Julian Salazar
Yuzong Liu
Katrin Kirchhoff
SSL
33
28
0
30 Jun 2019
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition
Linhao Dong
Bo Xu
27
125
0
27 May 2019
Language Modeling with Deep Transformers
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
46
171
0
10 May 2019
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data
  Augmentation
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Christoph Luscher
Eugen Beck
Kazuki Irie
M. Kitza
Wilfried Michel
Albert Zeyer
Ralf Schluter
Hermann Ney
VLM
13
234
0
08 May 2019
Towards Efficient Model Compression via Learned Global Ranking
Towards Efficient Model Compression via Learned Global Ranking
Ting-Wu Chin
Ruizhou Ding
Cha Zhang
Diana Marculescu
16
170
0
28 Apr 2019
Previous
123...131415