ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 750 papers shown
Title
Exploiting Spectral Augmentation for Code-Switched Spoken Language
  Identification
Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification
P. Rangan
Sundeep Teki
Hemant Misra
11
21
0
14 Oct 2020
Towards Data-efficient Modeling for Wake Word Spotting
Towards Data-efficient Modeling for Wake Word Spotting
Yixin Gao
Yuriy Mishchenko
Anish Shah
Spyros Matsoukas
S. Vitaladevuni
52
30
0
13 Oct 2020
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context
  Modeling
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling
Jiahui Yu
Wei Han
Anmol Gulati
Chung-Cheng Chiu
Yue Liu
Tara N. Sainath
Yonghui Wu
Ruoming Pang
30
18
0
12 Oct 2020
Improving Low Resource Code-switched ASR using Augmented Code-switched
  TTS
Improving Low Resource Code-switched ASR using Augmented Code-switched TTS
Yash Sharma
Basil Abraham
Karan Taneja
Preethi Jyothi
22
20
0
12 Oct 2020
Contrastive Representation Learning: A Framework and Review
Contrastive Representation Learning: A Framework and Review
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSL
AI4TS
189
687
0
10 Oct 2020
Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent
  Systems
Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems
Yinghui Huang
H. Kuo
Samuel Thomas
Zvi Kons
Kartik Audhkhasi
Brian Kingsbury
R. Hoory
M. Picheny
VLM
19
63
0
08 Oct 2020
Representation Learning for Sequence Data with Deep Autoencoding
  Predictive Components
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components
Junwen Bai
Weiran Wang
Yingbo Zhou
Caiming Xiong
SSL
AI4TS
27
12
0
07 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
25
11
0
05 Oct 2020
Differentiable Weighted Finite-State Transducers
Differentiable Weighted Finite-State Transducers
Awni Y. Hannun
Vineel Pratap
Jacob Kahn
Wei-Ning Hsu
36
29
0
02 Oct 2020
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech
  Recognition Baseline
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline
Yerbolat Khassanov
Saida Mussakhojayeva
A. Mirzakhmetov
A. Adiyev
Mukhamet Nurpeiissov
H. A. Varol
22
30
0
22 Sep 2020
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Piyush Bagad
Aman Dalmia
Jigar Doshi
Arsha Nagrani
Parag Bhamare
A. Mahale
S. Rane
N. Agarwal
R. Panicker
39
112
0
17 Sep 2020
On Multitask Loss Function for Audio Event Detection and Localization
On Multitask Loss Function for Audio Event Detection and Localization
Huy P Phan
L. D. Pham
P. Koch
Ngoc Q. K. Duong
Ian Mcloughlin
Alfred Mertins
29
14
0
11 Sep 2020
On Target Segmentation for Direct Speech Translation
On Target Segmentation for Direct Speech Translation
Mattia Antonino Di Gangi
Marco Gaido
Matteo Negri
Marco Turchi
37
14
0
10 Sep 2020
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device
  Speech Recognition
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition
Quan Wang
Ignacio López Moreno
Mert Saglam
K. Wilson
Alan Chiao
...
Yanzhang He
Wei Li
Jason W. Pelecanos
M. Nika
A. Gruenstein
VLM
39
82
0
09 Sep 2020
Overview and Evaluation of Sound Event Localization and Detection in
  DCASE 2019
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019
Archontis Politis
A. Mesaros
Sharath Adavanne
Toni Heittola
Tuomas Virtanen
27
126
0
06 Sep 2020
CRNNs for Urban Sound Tagging with spatiotemporal context
CRNNs for Urban Sound Tagging with spatiotemporal context
Augustin Arnault
Nicolas Riche
25
7
0
24 Aug 2020
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural
  Interfaces
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
Milind Rao
A. Raju
Pranav Dheram
Bach Bui
Ariya Rastrow
21
43
0
14 Aug 2020
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR
Hayato Futami
Hirofumi Inaguma
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
24
50
0
09 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
18
90
0
09 Aug 2020
Contextualized Translation of Automatically Segmented Speech
Contextualized Translation of Automatically Segmented Speech
Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Mauro Cettolo
Marco Turchi
25
18
0
05 Aug 2020
Land Cover Classification from Remote Sensing Images Based on
  Multi-Scale Fully Convolutional Network
Land Cover Classification from Remote Sensing Images Based on Multi-Scale Fully Convolutional Network
Rui Li
Shunyi Zheng
Chenxi Duan
Ce Zhang
26
99
0
01 Aug 2020
Semi-Supervised Learning with Data Augmentation for End-to-End ASR
Semi-Supervised Learning with Data Augmentation for End-to-End ASR
F. Weninger
F. Mana
R. Gemello
Jesús Andrés-Ferrer
P. Zhan
27
30
0
27 Jul 2020
Efficient minimum word error rate training of RNN-Transducer for
  end-to-end speech recognition
Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition
Jinxi Guo
Gautam Tiwari
J. Droppo
Maarten Van Segbroeck
Che-Wei Huang
A. Stolcke
Roland Maas
21
55
0
27 Jul 2020
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
CoVoST 2 and Massively Multilingual Speech-to-Text Translation
Changhan Wang
Anne Wu
J. Pino
SLR
31
72
0
20 Jul 2020
Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype
  Mining and Language-Dependent Score Normalization
Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization
Jenthe Thienpondt
Brecht Desplanques
Kris Demuynck
20
24
0
15 Jul 2020
TERA: Self-Supervised Learning of Transformer Encoder Representation for
  Speech
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
Andy T. Liu
Shang-Wen Li
Hung-yi Lee
SSL
67
356
0
12 Jul 2020
Class LM and word mapping for contextual biasing in End-to-End ASR
Class LM and word mapping for contextual biasing in End-to-End ASR
Rongqing Huang
Ossama Abdel-Hamid
Xinwei Li
G. Evermann
31
47
0
10 Jul 2020
Data Augmenting Contrastive Learning of Speech Representations in the
  Time Domain
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
Eugene Kharitonov
M. Rivière
Gabriel Synnaeve
Lior Wolf
Pierre-Emmanuel Mazaré
Matthijs Douze
Emmanuel Dupoux
31
117
0
02 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
372
0
29 Jun 2020
Streaming Transformer ASR with Blockwise Synchronous Beam Search
Streaming Transformer ASR with Blockwise Synchronous Beam Search
E. Tsunoo
Yosuke Kashiwagi
Shinji Watanabe
22
11
0
25 Jun 2020
Self-Supervised Representations Improve End-to-End Speech Translation
Self-Supervised Representations Improve End-to-End Speech Translation
Anne Wu
Changhan Wang
J. Pino
Jiatao Gu
SSL
29
40
0
22 Jun 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian
  DOA Vector and RD3net
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
16
19
0
22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of
  Gradients
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
35
2
0
21 Jun 2020
Boosting Active Learning for Speech Recognition with Noisy
  Pseudo-labeled Samples
Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples
Jihwan Bang
Heesu Kim
Y. Yoo
Jung-Woo Ha
11
2
0
19 Jun 2020
Are you wearing a mask? Improving mask detection from speech using
  augmentation by cycle-consistent GANs
Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs
Nicolae-Cuatualin Ristea
Radu Tudor Ionescu
CVBM
8
41
0
17 Jun 2020
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6
  Challenge
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge
Ashish Arora
Desh Raj
Aswin Shanmugam Subramanian
Ke Li
Bar Ben Yair
Matthew Maciejewski
Piotr Żelasko
Leibny Paola García-Perera
Shinji Watanabe
Sanjeev Khudanpur
39
9
0
14 Jun 2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Marco Turchi
21
53
0
04 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided
  Adversarial Autoencoder
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
Kazi Nazmul Haque
R. Rana
Björn W Schuller
DRL
31
12
0
01 Jun 2020
CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and
  Patients
CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients
Dani Kiyasseh
T. Zhu
David Clifton
33
186
0
27 May 2020
Multistream CNN for Robust Acoustic Modeling
Multistream CNN for Robust Acoustic Modeling
Kyu Jeong Han
Jing Pan
Venkata Krishna Naveen Tadala
T. Ma
Daniel Povey
19
34
0
21 May 2020
Simplified Self-Attention for Transformer-based End-to-End Speech
  Recognition
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition
Haoneng Luo
Shiliang Zhang
Ming Lei
Lei Xie
40
33
0
21 May 2020
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End
  Models for Speech Recognition
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition
Linhao Dong
Cheng Yi
Jianzong Wang
Shiyu Zhou
Shuang Xu
X. Jia
Bo Xu
36
17
0
20 May 2020
Early Stage LM Integration Using Local and Global Log-Linear Combination
Early Stage LM Integration Using Local and Global Log-Linear Combination
Wilfried Michel
Ralf Schluter
Hermann Ney
19
11
0
20 May 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based
  Quantized DNNs
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs
Yongkweon Jeon
Baeseong Park
S. Kwon
Byeongwook Kim
Jeongin Yun
Dongsoo Lee
MQ
35
30
0
20 May 2020
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict
Yosuke Higuchi
Shinji Watanabe
Nanxin Chen
Tetsuji Ogawa
Tetsunori Kobayashi
19
137
0
18 May 2020
Attention-based Transducer for Online Speech Recognition
Attention-based Transducer for Online Speech Recognition
Bin Wang
Yan Yin
Hui-Ching Lin
23
4
0
18 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
140
3,044
0
16 May 2020
Streaming Transformer-based Acoustic Models Using Self-attention with
  Augmented Memory
Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
Chunyang Wu
Yongqiang Wang
Yangyang Shi
Ching-Feng Yeh
Frank Zhang
RALM
31
60
0
16 May 2020
AccentDB: A Database of Non-Native English Accents to Assist Neural
  Speech Recognition
AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition
Afroz Ahamad
Ankit Anand
Pranesh Bhargava
19
22
0
16 May 2020
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech
  Recognition
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition
Zhengkun Tian
Jiangyan Yi
J. Tao
Ye Bai
Shuai Zhang
Zhengqi Wen
16
54
0
16 May 2020
Previous
123...131415
Next