Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,744 papers shown
Title
Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency
Jinchuan Tian
Rongzhi Gu
Helin Wang
Yuexian Zou
21
0
0
08 Apr 2021
WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition
Zhichao Wang
Wenwen Yang
Pan Zhou
Wei-Neng Chen
RALM
24
17
0
08 Apr 2021
Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings
L. Pepino
Pablo Riera
Luciana Ferrer
11
349
0
08 Apr 2021
Pushing the Limits of Non-Autoregressive Speech Recognition
Edwin G. Ng
Chung-Cheng Chiu
Yu Zhang
William Chan
VLM
6
27
0
07 Apr 2021
Librispeech Transducer Model with Internal Language Model Prior Correction
Albert Zeyer
André Merboldt
Wilfried Michel
Ralf Schluter
Hermann Ney
11
28
0
07 Apr 2021
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations
Jheng-hao Lin
Yist Y. Lin
C. Chien
Hung-yi Lee
20
56
0
07 Apr 2021
Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search For End-to-End ASR
Xian Shi
Pan Zhou
Wei-Neng Chen
Lei Xie
11
17
0
07 Apr 2021
Interpreting A Pre-trained Model Is A Key For Model Architecture Optimization: A Case Study On Wav2Vec 2.0
Liu Chen
Meysam Asgari
19
1
0
07 Apr 2021
Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models
Zhiyun Lu
Wei Han
Yu Zhang
Liangliang Cao
AAML
52
16
0
06 Apr 2021
Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions
Jumon Nozaki
Tatsuya Komatsu
10
71
0
06 Apr 2021
Non-autoregressive Mandarin-English Code-switching Speech Recognition
Shun-Po Chuang
Heng-Jui Chang
Sung-Feng Huang
Hung-yi Lee
11
15
0
06 Apr 2021
Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Yuan Shangguan
Rohit Prabhavalkar
Hang Su
Jay Mahadeokar
Yangyang Shi
...
Chunyang Wu
Duc Le
Ozlem Kalinli
Christian Fuegen
M. Seltzer
12
27
0
06 Apr 2021
Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
Duc Le
Mahaveer Jain
Gil Keren
Suyoun Kim
Yangyang Shi
...
Yuan Shangguan
Christian Fuegen
Ozlem Kalinli
Yatharth Saraf
M. Seltzer
25
90
0
05 Apr 2021
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Yangyang Shi
Varun K. Nagaraja
Chunyang Wu
Jay Mahadeokar
Duc Le
...
Ching-Feng Yeh
Julian Chan
Christian Fuegen
Ozlem Kalinli
M. Seltzer
17
15
0
05 Apr 2021
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
William Chan
Daniel S. Park
Chris A. Lee
Yu Zhang
Quoc V. Le
Mohammad Norouzi
AI4TS
24
135
0
05 Apr 2021
End-to-End Speaker-Attributed ASR with Transformer
Naoyuki Kanda
Guoli Ye
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
11
46
0
05 Apr 2021
SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
Patrick K. O’Neill
Vitaly Lavrukhin
Somshubra Majumdar
Vahid Noroozi
Yuekai Zhang
...
Keenan Freyberg
Michael D. Shulman
Boris Ginsburg
Shinji Watanabe
Georg Kucsko
AI4TS
18
59
0
05 Apr 2021
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
12
829
0
05 Apr 2021
Towards Lifelong Learning of End-to-end ASR
Heng-Jui Chang
Hung-yi Lee
Lin-Shan Lee
KELM
CLL
35
34
0
04 Apr 2021
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
Wei-Ning Hsu
Anuroop Sriram
Alexei Baevski
Tatiana Likhomanenko
Qiantong Xu
...
Jacob Kahn
Ann Lee
R. Collobert
Gabriel Synnaeve
Michael Auli
SSL
9
235
0
02 Apr 2021
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Axel Berg
Mark O'Connor
M. T. Cruz
16
129
0
01 Apr 2021
Evaluating Neural Word Embeddings for Sanskrit
Kevin Qinghong Lin
Om Adideva
Digumarthi Komal
Laxmidhar Behera
Pawan Goyal
29
12
0
01 Apr 2021
Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices
Gonçalo Mordido
Matthijs Van Keirsbilck
A. Keller
20
6
0
31 Mar 2021
Integer-only Zero-shot Quantization for Efficient Speech Recognition
Sehoon Kim
A. Gholami
Z. Yao
Nicholas Lee
Patrick Wang
Aniruddha Nrusimha
Bohan Zhai
Tianren Gao
Michael W. Mahoney
Kurt Keutzer
MQ
10
23
0
31 Mar 2021
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone
Naoyuki Kanda
Guoli Ye
Yu-Huan Wu
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
23
40
0
31 Mar 2021
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
17
1,866
0
29 Mar 2021
Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays
Junqi Chen
Xiao-Lei Zhang
9
10
0
29 Mar 2021
A Practical Survey on Faster and Lighter Transformers
Quentin Fournier
G. Caron
Daniel Aloise
6
93
0
26 Mar 2021
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation
Md. Akmal Haidar
Chao Xing
Mehdi Rezagholizadeh
11
7
0
17 Mar 2021
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition
A. Laptev
A. Andrusenko
Ivan Podluzhny
Anton Mitrofanov
Ivan Medennikov
Yuri N. Matveev
VLM
16
14
0
12 Mar 2021
Learning Word-Level Confidence For Subword End-to-End ASR
David Qiu
Qiujia Li
Yanzhang He
Yu Zhang
Bo-wen Li
...
Deepti Bhatia
Wei Li
Ke Hu
Tara N. Sainath
Ian McGraw
16
32
0
11 Mar 2021
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition
H. Inaguma
Tatsuya Kawahara
14
13
0
28 Feb 2021
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Tao Lei
RALM
VLM
45
47
0
24 Feb 2021
Conditional Positional Encodings for Vision Transformers
Xiangxiang Chu
Zhi Tian
Bo-Wen Zhang
Xinlong Wang
Chunhua Shen
ViT
9
600
0
22 Feb 2021
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods
Xian Shi
Fan Yu
Yizhou Lu
Yuhao Liang
Qiangze Feng
Daliang Wang
Y. Qian
Lei Xie
24
66
0
20 Feb 2021
Echo State Speech Recognition
H. Shrivastava
Ankush Garg
Yuan Cao
Yu Zhang
Tara N. Sainath
37
22
0
18 Feb 2021
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
224
0
12 Feb 2021
Intermediate Loss Regularization for CTC-based Speech Recognition
Jaesong Lee
Shinji Watanabe
111
135
0
05 Feb 2021
Mind the Gap: Assessing Temporal Generalization in Neural Language Models
Angeliki Lazaridou
A. Kuncoro
E. Gribovskaya
Devang Agrawal
Adam Liska
...
Sebastian Ruder
Dani Yogatama
Kris Cao
Susannah Young
Phil Blunsom
VLM
22
207
0
03 Feb 2021
WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit
Zhuoyuan Yao
Di Wu
Xiong Wang
Binbin Zhang
Fan Yu
Chao Yang
Zhendong Peng
Xiaoyu Chen
Lei Xie
X. Lei
11
260
0
02 Feb 2021
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Shota Horiguchi
Nelson Yalta
Leibny Paola García-Perera
Yuki Takashima
Yawen Xue
Desh Raj
Zili Huang
Yusuke Fujita
Shinji Watanabe
Sanjeev Khudanpur
BDL
16
36
0
02 Feb 2021
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec
Jiatong Shi
Jiatong Shi. Jonathan D. Amith
Rey Castillo García
Esteban Guadalupe Sierra
Kevin Duh
Shinji Watanabe
17
46
0
26 Jan 2021
Tiny Transducer: A Highly-efficient Speech Recognition Model on Edge Devices
Yuekai Zhang
Sining Sun
Long Ma
19
28
0
18 Jan 2021
Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications
Y. Oh
Kiyoung Park
Jeongue Park
OffRL
11
5
0
14 Jan 2021
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
Qing Wang
Jun Du
Hua-Xin Wu
Jia Pan
Feng Ma
Chin-Hui Lee
11
78
0
08 Jan 2021
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe
Florian Boyer
Xuankai Chang
Pengcheng Guo
Tomoki Hayashi
...
Shigeki Karita
Chenda Li
Jing Shi
Aswin Shanmugam Subramanian
Wangyou Zhang
VLM
31
38
0
23 Dec 2020
A review of on-device fully neural end-to-end automatic speech recognition algorithms
Chanwoo Kim
Dhananjaya N. Gowda
Dongsoo Lee
Jiyeon Kim
Ankur Kumar
Sungsoo Kim
Abhinav Garg
C. Han
11
27
0
14 Dec 2020
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
Arsha Nagrani
Joon Son Chung
Jaesung Huh
Andrew Brown
Ernesto Coto
Weidi Xie
Mitchell McLaren
D. Reynolds
Andrew Zisserman
13
74
0
12 Dec 2020
Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging
Rohit Prabhavalkar
Yanzhang He
David Rybach
S. Campbell
A. Narayanan
Trevor Strohman
Tara N. Sainath
41
35
0
12 Dec 2020
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition
Binbin Zhang
Di Wu
Zhuoyuan Yao
Xiong Wang
F. Yu
Chao Yang
Liyong Guo
Yaguang Hu
Lei Xie
X. Lei
19
77
0
10 Dec 2020
Previous
1
2
3
...
33
34
35
Next