Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.08779
Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"
50 / 750 papers shown
Title
LEAF: A Learnable Frontend for Audio Classification
Neil Zeghidour
O. Teboul
Félix de Chaumont Quitry
Marco Tagliasacchi
VLM
AAML
85
144
0
21 Jan 2021
Arabic Speech Recognition by End-to-End, Modular Systems and Human
A. Hussein
Shinji Watanabe
Ahmed M. Ali
VLM
24
47
0
21 Jan 2021
Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices
Yuekai Zhang
Sining Sun
Long Ma
37
28
0
18 Jan 2021
An evaluation of word-level confidence estimation for end-to-end automatic speech recognition
Dan Oneaţă
Alexandru Caranica
Adriana Stan
H. Cucu
UQCV
37
25
0
14 Jan 2021
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
Qing Wang
Jun Du
Hua-Xin Wu
Jia Pan
Feng Ma
Chin-Hui Lee
15
79
0
08 Jan 2021
Environment Transfer for Distributed Systems
Chunheng Jiang
Jae-wook Ahn
N. Desai
28
1
0
06 Jan 2021
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Hieu H. Pham
Quoc V. Le
76
56
0
05 Jan 2021
NeurST: Neural Speech Translation Toolkit
Chengqi Zhao
Mingxuan Wang
Qianqian Dong
Rong Ye
Lei Li
30
32
0
18 Dec 2020
CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition
Minglun Han
Linhao Dong
Shiyu Zhou
Bo Xu
21
21
0
17 Dec 2020
A review of on-device fully neural end-to-end automatic speech recognition algorithms
Chanwoo Kim
Dhananjaya N. Gowda
Dongsoo Lee
Jiyeon Kim
Ankur Kumar
Sungsoo Kim
Abhinav Garg
C. Han
27
27
0
14 Dec 2020
Bayesian Learning for Deep Neural Network Adaptation
Xurong Xie
Xunying Liu
Tan Lee
Lan Wang
BDL
30
20
0
14 Dec 2020
Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning
Wei Xia
Chunlei Zhang
Chao Weng
Meng Yu
Dong Yu
SSL
25
78
0
13 Dec 2020
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
Rohit Prabhavalkar
Yanzhang He
David Rybach
S. Campbell
A. Narayanan
Trevor Strohman
Tara N. Sainath
52
35
0
12 Dec 2020
Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion
Vijay Ravi
Yile Gu
Ankur Gandhe
Ariya Rastrow
Linda Liu
Denis Filimonov
Scott Novotney
I. Bulyko
27
9
0
30 Nov 2020
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training
Sameer Khurana
Niko Moritz
Takaaki Hori
Jonathan Le Roux
24
54
0
26 Nov 2020
Deep Discriminative Feature Learning for Accent Recognition
Wei Wang
Chao Zhang
Xiao-pei Wu
34
2
0
25 Nov 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
33
11
0
24 Nov 2020
Improving RNN-T ASR Accuracy Using Context Audio
A. Schwarz
Ilya Sklyar
Simon Wiesler
32
9
0
20 Nov 2020
Predicting Rigid Body Dynamics using Dual Quaternion Recurrent Neural Networks with Quaternion Attention
Johannes Pöppelbaum
Andreas Schwung
16
13
0
17 Nov 2020
Unsupervised Contrastive Learning of Sound Event Representations
Eduardo Fonseca
Diego Ortego
Kevin McGuinness
Noel E. O'Connor
Xavier Serra
SSL
27
65
0
15 Nov 2020
Low-resource expressive text-to-speech using data augmentation
Goeric Huybrechts
Thomas Merritt
Giulia Comini
Bartek Perz
Raahil Shah
Jaime Lorenzo-Trueba
26
50
0
11 Nov 2020
Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Jeff Lai
Jin Cao
S. Bodapati
Shang-Wen Li
SSL
22
7
0
11 Nov 2020
Data Augmentation For Children's Speech Recognition -- The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge
Guoguo Chen
Xingyu Na
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Sifan Ma
Yujun Wang
35
19
0
09 Nov 2020
Dual Application of Speech Enhancement for Automatic Speech Recognition
Ashutosh Pandey
Chunxi Liu
Yun Wang
Yatharth Saraf
46
37
0
07 Nov 2020
Improving RNN Transducer Based ASR with Auxiliary Tasks
Chunxi Liu
Frank Zhang
Duc Le
Suyoun Kim
Yatharth Saraf
Geoffrey Zweig
31
49
0
05 Nov 2020
A Two-Stage Approach to Device-Robust Acoustic Scene Classification
Hu Hu
Chao-Han Huck Yang
Xianjun Xia
Xue Bai
Xin Tang
...
Yuanjun Zhao
Sabato Marco Siniscalchi
Yannan Wang
Jun Du
Chin-Hui Lee
22
30
0
03 Nov 2020
SapAugment: Learning A Sample Adaptive Policy for Data Augmentation
Ting-Yao Hu
A. Shrivastava
Jen-Hao Rick Chang
H. Koppula
Stefan Braun
Kyuyeon Hwang
Ozlem Kalinli
Oncel Tuzel
19
16
0
02 Nov 2020
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
Hang Le
J. Pino
Changhan Wang
Jiatao Gu
D. Schwab
Laurent Besacier
43
82
0
02 Nov 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
23
86
0
29 Oct 2020
Cascaded encoders for unifying streaming and non-streaming ASR
A. Narayanan
Tara N. Sainath
Ruoming Pang
Jiahui Yu
Chung-Cheng Chiu
Rohit Prabhavalkar
Ehsan Variani
Trevor Strohman
AuLLM
8
85
0
27 Oct 2020
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning
Dongwei Jiang
Wubo Li
Miao Cao
Wei Zou
Xiangang Li
SSL
27
65
0
27 Oct 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
45
262
0
26 Oct 2020
MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection
Fei Jia
Somshubra Majumdar
Boris Ginsburg
21
48
0
26 Oct 2020
Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer
Suyoun Kim
Shangguan Yuan
Jay Mahadeokar
A. Bruguier
Christian Fuegen
M. Seltzer
Duc Le
23
28
0
26 Oct 2020
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining
Cheng-I Jeff Lai
Yung-Sung Chuang
Hung-yi Lee
Shang-Wen Li
James R. Glass
VLM
SSL
27
58
0
26 Oct 2020
Two-stage Textual Knowledge Distillation for End-to-End Spoken Language Understanding
Seongbin Kim
Gyuwan Kim
Seongjin Shin
Sangmin Lee
VLM
18
19
0
25 Oct 2020
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
Yin Cao
Turab Iqbal
Qiuqiang Kong
Y. Zhong
Wenwu Wang
Mark D. Plumbley
16
75
0
25 Oct 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Ethan A. Chi
Julian Salazar
Katrin Kirchhoff
AI4TS
25
51
0
24 Oct 2020
Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
Menglong Xu
Shengqiang Li
Xiao-Lei Zhang
29
31
0
23 Oct 2020
Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning
Sungkyun Chang
Donmoon Lee
Jeongsoon Park
Hyungui Lim
Kyogu Lee
Karam Ko
Yoonchang Han
25
34
0
22 Oct 2020
Urban Sound Classification : striving towards a fair comparison
Augustin Arnault
Baptiste Hanssens
Nicolas Riche
26
8
0
22 Oct 2020
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Tatiana Likhomanenko
Qiantong Xu
Vineel Pratap
Paden Tomasello
Jacob Kahn
Gilad Avidov
R. Collobert
Gabriel Synnaeve
39
98
0
22 Oct 2020
Similarity Analysis of Self-Supervised Speech Representations
Yu-An Chung
Yonatan Belinkov
James R. Glass
SSL
41
36
0
22 Oct 2020
Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Qiujia Li
David Qiu
Yu Zhang
Yue Liu
Yanzhang He
P. Woodland
Liangliang Cao
Trevor Strohman
12
46
0
22 Oct 2020
A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks
Yun Tang
J. Pino
Changhan Wang
Xutai Ma
Dmitriy Genzel
26
73
0
21 Oct 2020
Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin
Daniel A. Ajisafe
O. Adegboro
Esther Oduntan
T. Arulogun
29
4
0
21 Oct 2020
Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition
Yangyang Shi
Yongqiang Wang
Chunyang Wu
Ching-Feng Yeh
Julian Chan
Frank Zhang
Duc Le
M. Seltzer
58
168
0
21 Oct 2020
CLAR: Contrastive Learning of Auditory Representations
Haider Al-Tahan
Y. Mohsenzadeh
SSL
125
56
0
19 Oct 2020
Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
Ludwig Kurzinger
Nicolas Lindae
Palle Klewitz
Gerhard Rigoll
32
5
0
15 Oct 2020
Viewmaker Networks: Learning Views for Unsupervised Representation Learning
Alex Tamkin
Mike Wu
Noah D. Goodman
SSL
35
64
0
14 Oct 2020
Previous
1
2
3
...
12
13
14
15
Next