SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 750 papers shown

Title
Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification P. Rangan Sundeep Teki Hemant Misra 11 21 0 14 Oct 2020
Towards Data-efficient Modeling for Wake Word Spotting Yixin Gao Yuriy Mishchenko Anish Shah Spyros Matsoukas S. Vitaladevuni 52 30 0 13 Oct 2020
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling Jiahui Yu Wei Han Anmol Gulati Chung-Cheng Chiu Yue Liu Tara N. Sainath Yonghui Wu Ruoming Pang 30 18 0 12 Oct 2020
Improving Low Resource Code-switched ASR using Augmented Code-switched TTS Yash Sharma Basil Abraham Karan Taneja Preethi Jyothi 22 20 0 12 Oct 2020
Contrastive Representation Learning: A Framework and Review Phúc H. Lê Khắc Graham Healy Alan F. Smeaton SSL AI4TS 189 687 0 10 Oct 2020
Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems Yinghui Huang H. Kuo Samuel Thomas Zvi Kons Kartik Audhkhasi Brian Kingsbury R. Hoory M. Picheny VLM 19 63 0 08 Oct 2020
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components Junwen Bai Weiran Wang Yingbo Zhou Caiming Xiong SSL AI4TS 27 12 0 07 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition Tejas Srinivasan Ramon Sanabria Florian Metze Desmond Elliott 25 11 0 05 Oct 2020
Differentiable Weighted Finite-State Transducers Awni Y. Hannun Vineel Pratap Jacob Kahn Wei-Ning Hsu 36 29 0 02 Oct 2020
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline Yerbolat Khassanov Saida Mussakhojayeva A. Mirzakhmetov A. Adiyev Mukhamet Nurpeiissov H. A. Varol 22 30 0 22 Sep 2020
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds Piyush Bagad Aman Dalmia Jigar Doshi Arsha Nagrani Parag Bhamare A. Mahale S. Rane N. Agarwal R. Panicker 39 112 0 17 Sep 2020
On Multitask Loss Function for Audio Event Detection and Localization Huy P Phan L. D. Pham P. Koch Ngoc Q. K. Duong Ian Mcloughlin Alfred Mertins 29 14 0 11 Sep 2020
On Target Segmentation for Direct Speech Translation Mattia Antonino Di Gangi Marco Gaido Matteo Negri Marco Turchi 37 14 0 10 Sep 2020
VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition Quan Wang Ignacio López Moreno Mert Saglam K. Wilson Alan Chiao ... Yanzhang He Wei Li Jason W. Pelecanos M. Nika A. Gruenstein VLM 39 82 0 09 Sep 2020
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019 Archontis Politis A. Mesaros Sharath Adavanne Toni Heittola Tuomas Virtanen 27 126 0 06 Sep 2020
CRNNs for Urban Sound Tagging with spatiotemporal context Augustin Arnault Nicolas Riche 25 7 0 24 Aug 2020
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces Milind Rao A. Raju Pranav Dheram Bach Bui Ariya Rastrow 21 43 0 14 Aug 2020
Distilling the Knowledge of BERT for Sequence-to-Sequence ASR Hayato Futami Hirofumi Inaguma Sei Ueno Masato Mimura S. Sakai Tatsuya Kawahara 24 50 0 09 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition Jin Xu Xu Tan Yi Ren Tao Qin Jian Li Sheng Zhao Tie-Yan Liu VLM 18 90 0 09 Aug 2020
Contextualized Translation of Automatically Segmented Speech Marco Gaido Mattia Antonino Di Gangi Matteo Negri Mauro Cettolo Marco Turchi 25 18 0 05 Aug 2020
Land Cover Classification from Remote Sensing Images Based on Multi-Scale Fully Convolutional Network Rui Li Shunyi Zheng Chenxi Duan Ce Zhang 26 99 0 01 Aug 2020
Semi-Supervised Learning with Data Augmentation for End-to-End ASR F. Weninger F. Mana R. Gemello Jesús Andrés-Ferrer P. Zhan 27 30 0 27 Jul 2020
Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition Jinxi Guo Gautam Tiwari J. Droppo Maarten Van Segbroeck Che-Wei Huang A. Stolcke Roland Maas 21 55 0 27 Jul 2020
CoVoST 2 and Massively Multilingual Speech-to-Text Translation Changhan Wang Anne Wu J. Pino SLR 31 72 0 20 Jul 2020
Cross-Lingual Speaker Verification with Domain-Balanced Hard Prototype Mining and Language-Dependent Score Normalization Jenthe Thienpondt Brecht Desplanques Kris Demuynck 20 24 0 15 Jul 2020
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech Andy T. Liu Shang-Wen Li Hung-yi Lee SSL 67 356 0 12 Jul 2020
Class LM and word mapping for contextual biasing in End-to-End ASR Rongqing Huang Ossama Abdel-Hamid Xinwei Li G. Evermann 31 47 0 10 Jul 2020
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain Eugene Kharitonov M. Rivière Gabriel Synnaeve Lior Wolf Pierre-Emmanuel Mazaré Matthijs Douze Emmanuel Dupoux 31 117 0 02 Jul 2020
Self-Supervised MultiModal Versatile Networks Jean-Baptiste Alayrac Adrià Recasens R. Schneider Relja Arandjelović Jason Ramapuram J. Fauw Lucas Smaira Sander Dieleman Andrew Zisserman SSL 40 372 0 29 Jun 2020
Streaming Transformer ASR with Blockwise Synchronous Beam Search E. Tsunoo Yosuke Kashiwagi Shinji Watanabe 22 11 0 25 Jun 2020
Self-Supervised Representations Improve End-to-End Speech Translation Anne Wu Changhan Wang J. Pino Jiatao Gu SSL 29 40 0 22 Jun 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net Kazuki Shimada Naoya Takahashi Shusuke Takahashi Yuki Mitsufuji 16 19 0 22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients Chenfei Zhu Yu Cheng Zhe Gan Furong Huang Jingjing Liu Tom Goldstein ODL 35 2 0 21 Jun 2020
Boosting Active Learning for Speech Recognition with Noisy Pseudo-labeled Samples Jihwan Bang Heesu Kim Y. Yoo Jung-Woo Ha 11 2 0 19 Jun 2020
Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs Nicolae-Cuatualin Ristea Radu Tudor Ionescu CVBM 8 41 0 17 Jun 2020
The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge Ashish Arora Desh Raj Aswin Shanmugam Subramanian Ke Li Bar Ben Yair Matthew Maciejewski Piotr Żelasko Leibny Paola García-Perera Shinji Watanabe Sanjeev Khudanpur 39 9 0 14 Jun 2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020 Marco Gaido Mattia Antonino Di Gangi Matteo Negri Marco Turchi 21 53 0 04 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder Kazi Nazmul Haque R. Rana Björn W Schuller DRL 31 12 0 01 Jun 2020
CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients Dani Kiyasseh T. Zhu David Clifton 33 186 0 27 May 2020
Multistream CNN for Robust Acoustic Modeling Kyu Jeong Han Jing Pan Venkata Krishna Naveen Tadala T. Ma Daniel Povey 19 34 0 21 May 2020
Simplified Self-Attention for Transformer-based End-to-End Speech Recognition Haoneng Luo Shiliang Zhang Ming Lei Lei Xie 40 33 0 21 May 2020
A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition Linhao Dong Cheng Yi Jianzong Wang Shiyu Zhou Shuang Xu X. Jia Bo Xu 36 17 0 20 May 2020
Early Stage LM Integration Using Local and Global Log-Linear Combination Wilfried Michel Ralf Schluter Hermann Ney 19 11 0 20 May 2020
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs Yongkweon Jeon Baeseong Park S. Kwon Byeongwook Kim Jeongin Yun Dongsoo Lee MQ 35 30 0 20 May 2020
Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict Yosuke Higuchi Shinji Watanabe Nanxin Chen Tetsuji Ogawa Tetsunori Kobayashi 19 137 0 18 May 2020
Attention-based Transducer for Online Speech Recognition Bin Wang Yan Yin Hui-Ching Lin 23 4 0 18 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang ... Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu Ruoming Pang 140 3,044 0 16 May 2020
Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory Chunyang Wu Yongqiang Wang Yangyang Shi Ching-Feng Yeh Frank Zhang RALM 31 60 0 16 May 2020
AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition Afroz Ahamad Ankit Anand Pranesh Bhargava 19 22 0 16 May 2020
Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition Zhengkun Tian Jiangyan Yi J. Tao Ye Bai Shuai Zhang Zhengqi Wen 16 54 0 16 May 2020