ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.08779
  4. Cited By
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 April 2019
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
    VLM
ArXivPDFHTML

Papers citing "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition"

50 / 753 papers shown
Title
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern
  Recognition
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition
S. Verbitskiy
Vladimir Berikov
Viacheslav Vyshegorodtsev
24
73
0
03 Jun 2021
Should We Always Separate?: Switching Between Enhanced and Observed
  Signals for Overlapping Speech Recognition
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition
Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
K. Kinoshita
Takafumi Moriya
Naoyuki Kamo
33
23
0
02 Jun 2021
Improving the Adversarial Robustness for Speaker Verification by
  Self-Supervised Learning
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
Haibin Wu
Xu Li
Andy T. Liu
Zhiyong Wu
Helen Meng
Hung-yi Lee
AAML
SSL
55
29
0
01 Jun 2021
The Imaginative Generative Adversarial Network: Automatic Data
  Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action
  Recognition
The Imaginative Generative Adversarial Network: Automatic Data Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action Recognition
Junxiao Shen
John J. Dudley
Per Ola Kristensson
SLR
GAN
38
23
0
27 May 2021
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained
  Models into Speech Translation Encoders
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Chen Xu
Bojie Hu
Yanyang Li
Yuhao Zhang
Shen Huang
Qi Ju
Tong Xiao
Jingbo Zhu
28
76
0
12 May 2021
Voice activity detection in the wild: A data-driven approach using
  teacher-student training
Voice activity detection in the wild: A data-driven approach using teacher-student training
Heinrich Dinkel
Shuai Wang
Xuenan Xu
Mengyue Wu
K. Yu
VLM
19
32
0
10 May 2021
FastCorrect: Fast Error Correction with Edit Alignment for Automatic
  Speech Recognition
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Yichong Leng
Xu Tan
Linchen Zhu
Jin Xu
Renqian Luo
Linquan Liu
Tao Qin
Xiang-Yang Li
Ed Lin
Tie-Yan Liu
KELM
29
63
0
09 May 2021
Efficient Weight factorization for Multilingual Speech Recognition
Efficient Weight factorization for Multilingual Speech Recognition
Ngoc-Quan Pham
Tuan-Nam Nguyen
S. Stueker
A. Waibel
48
19
0
07 May 2021
Self-Supervised Learning from Automatically Separated Sound Scenes
Self-Supervised Learning from Automatically Separated Sound Scenes
Eduardo Fonseca
A. Jansen
D. Ellis
Scott Wisdom
Marco Tagliasacchi
J. Hershey
Manoj Plakal
Shawn Hershey
R. C. Moore
Xavier Serra
SSL
44
13
0
05 May 2021
Accent Recognition with Hybrid Phonetic Features
Accent Recognition with Hybrid Phonetic Features
Zhan Zhang
Xi Chen
Yuehai Wang
Jianyi Yang
24
18
0
05 May 2021
SUPERB: Speech processing Universal PERformance Benchmark
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
59
899
0
03 May 2021
On the limit of English conversational speech recognition
On the limit of English conversational speech recognition
Zoltán Tüske
G. Saon
Brian Kingsbury
27
50
0
03 May 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable
  Sequence Tasks
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Siddharth Dalmia
Brian Yan
Vikas Raunak
Florian Metze
Shinji Watanabe
47
30
0
02 May 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Scaling End-to-End Models for Large-Scale Multilingual ASR
Yue Liu
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Wenjie Huang
Min Ma
Junwen Bai
CLL
34
76
0
30 Apr 2021
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shixing Chen
Xiaohan Nie
David D. Fan
Dongqing Zhang
Vimal Bhat
Raffay Hamid
SSL
31
62
0
28 Apr 2021
Joint Representation Learning and Novel Category Discovery on Single-
  and Multi-modal Data
Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data
Xu Jia
Kai Han
Yukun Zhu
Bradley Green
159
57
0
26 Apr 2021
Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct
  Speech Translation
Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation
Marco Gaido
Matteo Negri
Mauro Cettolo
Marco Turchi
VLM
58
25
0
23 Apr 2021
Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and
  Backward Transformers
Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers
Yusuke Kida
Tatsuya Komatsu
M. Togami
21
1
0
21 Apr 2021
Fusing information streams in end-to-end audio-visual speech recognition
Fusing information streams in end-to-end audio-visual speech recognition
Wentao Yu
Steffen Zeiler
D. Kolossa
81
12
0
19 Apr 2021
Advanced Long-context End-to-end Speech Recognition Using
  Context-expanded Transformers
Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
30
34
0
19 Apr 2021
Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Changhan Wang
Anne Wu
J. Pino
Alexei Baevski
Michael Auli
Alexis Conneau
SSL
37
44
0
14 Apr 2021
Comparing the Benefit of Synthetic Training Data for Various Automatic
  Speech Recognition Architectures
Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures
Nick Rossenbach
Mohammad Zeineldeen
Benedikt Hilmes
Ralf Schluter
Hermann Ney
36
12
0
12 Apr 2021
Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Yang Gao
Tyler Vuong
Mahsa Elyasi
Gaurav Bharaj
Rita Singh
26
20
0
08 Apr 2021
Graph Attention Networks for Anti-Spoofing
Graph Attention Networks for Anti-Spoofing
Hemlata Tak
Jee-weon Jung
J. Patino
Massimiliano Todisco
Nicholas W. D. Evans
49
66
0
08 Apr 2021
Contextualized Streaming End-to-End Speech Recognition with Trie-Based
  Deep Biasing and Shallow Fusion
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Duc Le
Mahaveer Jain
Gil Keren
Suyoun Kim
Yangyang Shi
...
Yuan Shangguan
Christian Fuegen
Ozlem Kalinli
Yatharth Saraf
M. Seltzer
32
90
0
05 Apr 2021
Semantic Distance: A New Metric for ASR Performance Analysis Towards
  Spoken Language Understanding
Semantic Distance: A New Metric for ASR Performance Analysis Towards Spoken Language Understanding
Suyoun Kim
Abhinav Arora
Duc Le
Ching-Feng Yeh
Christian Fuegen
Ozlem Kalinli
M. Seltzer
38
25
0
05 Apr 2021
AST: Audio Spectrogram Transformer
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
58
839
0
05 Apr 2021
Towards Lifelong Learning of End-to-end ASR
Towards Lifelong Learning of End-to-end ASR
Heng-Jui Chang
Hung-yi Lee
Lin-Shan Lee
KELM
CLL
35
34
0
04 Apr 2021
HMM-Free Encoder Pre-Training for Streaming RNN Transducer
HMM-Free Encoder Pre-Training for Streaming RNN Transducer
Lu Huang
J. Sun
Yu Tang
Junfeng Hou
Jinkun Chen
Jun Zhang
Zejun Ma
25
3
0
02 Apr 2021
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Axel Berg
Mark O'Connor
M. T. Cruz
32
133
0
01 Apr 2021
A study of latent monotonic attention variants
A study of latent monotonic attention variants
Albert Zeyer
Ralf Schluter
Hermann Ney
24
5
0
30 Mar 2021
Scaling sparsemax based channel selection for speech recognition with
  ad-hoc microphone arrays
Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays
Junqi Chen
Xiao-Lei Zhang
15
10
0
29 Mar 2021
Residual Energy-Based Models for End-to-End Speech Recognition
Residual Energy-Based Models for End-to-End Speech Recognition
Qiujia Li
Yu Zhang
Yue Liu
Liangliang Cao
P. Woodland
33
14
0
25 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
31
33
0
18 Mar 2021
Advancing RNN Transducer Technology for Speech Recognition
Advancing RNN Transducer Technology for Speech Recognition
G. Saon
Zoltan Tueske
Daniel Bolaños
Brian Kingsbury
43
86
0
17 Mar 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning
  with Self-Knowledge Distillation
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation
Md. Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
27
7
0
17 Mar 2021
Reweighting Augmented Samples by Minimizing the Maximal Expected Loss
Reweighting Augmented Samples by Minimizing the Maximal Expected Loss
Mingyang Yi
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Zhi-Ming Ma
17
19
0
16 Mar 2021
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
  End-to-End Speech Recognition
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
A. Laptev
A. Andrusenko
Ivan Podluzhny
Anton Mitrofanov
Ivan Medennikov
Yuri N. Matveev
VLM
26
14
0
12 Mar 2021
Wav2vec-C: A Self-supervised Model for Speech Representation Learning
Wav2vec-C: A Self-supervised Model for Speech Representation Learning
Samik Sadhu
Di He
Che-Wei Huang
Sri Harish Reddy Mallidi
Minhua Wu
Ariya Rastrow
A. Stolcke
J. Droppo
Roland Maas
SSL
20
48
0
09 Mar 2021
Contrastive Semi-supervised Learning for ASR
Contrastive Semi-supervised Learning for ASR
Alex Xiao
Christian Fuegen
Abdel-rahman Mohamed
26
20
0
09 Mar 2021
Slow-Fast Auditory Streams For Audio Recognition
Slow-Fast Auditory Streams For Audio Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
28
66
0
05 Mar 2021
An Empirical Study of End-to-end Simultaneous Speech Translation
  Decoding Strategies
An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies
H. Nguyen
Yannick Esteve
Laurent Besacier
35
19
0
04 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
91
978
0
04 Mar 2021
The NPU System for the 2020 Personalized Voice Trigger Challenge
The NPU System for the 2020 Personalized Voice Trigger Challenge
Jingyong Hou
Li Lyna Zhang
Yihui Fu
Qing Wang
Zhanheng Yang
Qijie Shao
Lei Xie
29
7
0
26 Feb 2021
MixSpeech: Data Augmentation for Low-resource Automatic Speech
  Recognition
MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition
Linghui Meng
Jin Xu
Xu Tan
Jindong Wang
Tao Qin
Bo Xu
VLM
66
77
0
25 Feb 2021
The Accented English Speech Recognition Challenge 2020: Open Datasets,
  Tracks, Baselines, Results and Methods
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods
Xian Shi
Fan Yu
Yizhou Lu
Yuhao Liang
Qiangze Feng
Daliang Wang
Y. Qian
Lei Xie
26
66
0
20 Feb 2021
End-to-End Neural Systems for Automatic Children Speech Recognition: An
  Empirical Study
End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Prashanth Gurunath Shivakumar
Shrikanth Narayanan
30
48
0
19 Feb 2021
End-to-End Automatic Speech Recognition with Deep Mutual Learning
End-to-End Automatic Speech Recognition with Deep Mutual Learning
Ryo Masumura
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Takanori Ashihara
27
5
0
16 Feb 2021
Hierarchical Transformer-based Large-Context End-to-end ASR with
  Large-Context Knowledge Distillation
Hierarchical Transformer-based Large-Context End-to-end ASR with Large-Context Knowledge Distillation
Ryo Masumura
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
33
29
0
16 Feb 2021
Adversarial defense for automatic speaker verification by cascaded
  self-supervised learning models
Adversarial defense for automatic speaker verification by cascaded self-supervised learning models
Haibin Wu
Xu Li
Andy T. Liu
Zhiyong Wu
Helen Meng
Hung-yi Lee
AAML
37
40
0
14 Feb 2021
Previous
123...111213141516
Next