Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,745 papers shown
Title
JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification
Shinnosuke Takamichi
Ludwig Kurzinger
Takaaki Saeki
Sayaka Shiota
Shinji Watanabe
11
22
0
17 Dec 2021
Self-Supervised Learning for speech recognition with Intermediate layer supervision
Chengyi Wang
Yu-Huan Wu
Sanyuan Chen
Shujie Liu
Jinyu Li
Yao Qian
Zhenglu Yang
SSL
16
28
0
16 Dec 2021
Progressive Graph Convolution Network for EEG Emotion Recognition
Yijing Zhou
Fu Li
Yang Li
Youshuo Ji
Guangming Shi
Wenming Zheng
Lijian Zhang
Yuanfang Chen
Rui Cheng
22
37
0
14 Dec 2021
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition
Guodong Ma
Pengfei Hu
Nurmemet Yolwas
Shen Huang
Hao-Ming Huang
19
4
0
13 Dec 2021
ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Holy Lovenia
Samuel Cahyawijaya
Genta Indra Winata
Peng-Tao Xu
Xu Yan
...
Elham J. Barezi
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
28
32
0
12 Dec 2021
Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Peter William VanHarn Plantinga
Deblin Bagchi
Eric Fosler-Lussier
46
10
0
11 Dec 2021
Are E2E ASR models ready for an industrial usage?
Valentin Vielzeuf
G. Antipov
18
8
0
09 Dec 2021
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
18
37
0
08 Dec 2021
A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules
Xinfeng Xie
Prakash Prabhu
Ulysse Beaugnon
P. Phothilimthana
Sudip Roy
Azalia Mirhoseini
E. Brevdo
James Laudon
Yanqi Zhou
25
5
0
07 Dec 2021
BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge
Yuting Yang
Binbin Du
Yingxin Zhang
Wenxuan Wang
Yuke Li
16
0
0
03 Dec 2021
Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding
Weiran Wang
Ke Hu
Tara N. Sainath
27
21
0
01 Dec 2021
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization
Brian Yan
Chunlei Zhang
Meng Yu
Shi-Xiong Zhang
Siddharth Dalmia
Dan Berrebbi
Chao Weng
Shinji Watanabe
Dong Yu
17
22
0
29 Nov 2021
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Siddhant Arora
Siddharth Dalmia
Pavel Denisov
Xuankai Chang
Yushi Ueda
...
Karthik Ganesan
Brian Yan
Ngoc Thang Vu
A. Black
Shinji Watanabe
VLM
23
74
0
29 Nov 2021
Global Interaction Modelling in Vision Transformer via Super Tokens
Ammarah Farooq
Muhammad Awais
S. Ahmed
J. Kittler
ViT
30
6
0
25 Nov 2021
SimpleTRON: Simple Transformer with O(N) Complexity
Uladzislau Yorsh
Alexander Kovalenko
Vojtvech Vanvcura
Daniel Vavsata
Pavel Kordík
Tomávs Mikolov
28
1
0
23 Nov 2021
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance
Heeseung Kim
Sungwon Kim
Sungroh Yoon
DiffM
BDL
19
107
0
23 Nov 2021
Semi-Supervised Vision Transformers
Zejia Weng
Xitong Yang
Ang Li
Zuxuan Wu
Yu-Gang Jiang
ViT
9
40
0
22 Nov 2021
Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature
Yiwen Shao
Shi-Xiong Zhang
Dong Yu
18
15
0
22 Nov 2021
Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions
Chunxi Liu
M. Picheny
Leda Sari
Pooja Chitkara
Alex Xiao
Xiaohui Zhang
Mark Chou
Andres Alvarado
C. Hazirbas
Yatharth Saraf
23
41
0
18 Nov 2021
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation
Tom O'Malley
A. Narayanan
Quan Wang
Alex Park
James Walker
N. Howard
22
27
0
18 Nov 2021
Joint Unsupervised and Supervised Training for Multilingual ASR
Junwen Bai
Bo-wen Li
Yu Zhang
Ankur Bapna
Nikhil Siddhartha
K. Sim
Tara N. Sainath
18
58
0
15 Nov 2021
Attention based end to end Speech Recognition for Voice Search in Hindi and English
Raviraj Joshi
Venkateshan Kannan
18
6
0
15 Nov 2021
Soft-Sensing ConFormer: A Curriculum Learning-based Convolutional Transformer
Jaswanth K. Yella
Chao Zhang
Sergei Petrov
Yu Huang
Xiaoye Qian
A. Minai
Sthitie Bom
25
7
0
12 Nov 2021
Transformer-based Image Compression
Ming-Tse Lu
Peiyao Guo
Huiqing Shi
Chuntong Cao
Zhan Ma
ViT
59
103
0
12 Nov 2021
Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation
Yihui Fu
Yun Liu
Jingdong Li
Dawei Luo
Shubo Lv
Yukai Jv
Lei Xie
19
48
0
11 Nov 2021
Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models
J. Yoon
H. Kim
Hyeon Seung Lee
Sunghwan Ahn
N. Kim
28
1
0
05 Nov 2021
Conformer-based Hybrid ASR System for Switchboard Dataset
Mohammad Zeineldeen
Jingjing Xu
Christoph Luscher
Wilfried Michel
Alexander Gerstenberger
Ralf Schluter
Hermann Ney
22
24
0
05 Nov 2021
MT3: Multi-Task Multitrack Music Transcription
Josh Gardner
Ian Simon
Ethan Manilow
Curtis Hawthorne
Jesse Engel
29
94
0
04 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
129
123
0
04 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
24
362
0
02 Nov 2021
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
Peter Wu
Jiatong Shi
Yifan Zhong
Shinji Watanabe
A. Black
16
8
0
02 Nov 2021
Sequence Transduction with Graph-based Supervision
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
16
6
0
01 Nov 2021
Exploring Non-Autoregressive End-To-End Neural Modeling For English Mispronunciation Detection And Diagnosis
Hsin-Wei Wang
Bi-Cheng Yan
Hsuan-Sheng Chiu
Yung-Chang Hsu
Berlin Chen
16
7
0
01 Nov 2021
SNRi Target Training for Joint Speech Enhancement and Recognition
Yuma Koizumi
Shigeki Karita
A. Narayanan
S. Panchapagesan
M. Bacchiani
25
14
0
01 Nov 2021
Cross-attention conformer for context modeling in speech enhancement for ASR
A. Narayanan
Chung-Cheng Chiu
Tom O'Malley
Quan Wang
Yanzhang He
24
14
0
30 Oct 2021
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
11
13
0
29 Oct 2021
Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
SSL
19
2
0
29 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
80
1,700
0
26 Oct 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Yanqing Liu
Rui Shao
G. Wang
Kuan Chen
Bohan Li
P. Yuen
Jinzhu Li
Lei He
Sheng Zhao
32
55
0
25 Oct 2021
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Ting-Yao Hu
Mohammadreza Armandpour
A. Shrivastava
Jen-Hao Rick Chang
H. Koppula
Oncel Tuzel
SyDa
52
42
0
21 Oct 2021
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training
Ankur Bapna
Yu-An Chung
Na Wu
Anmol Gulati
Ye Jia
J. Clark
Melvin Johnson
Jason Riesa
Alexis Conneau
Yu Zhang
VLM
51
94
0
20 Oct 2021
Personalized Speech Enhancement: New Models and Comprehensive Evaluation
Sefik Emre Eskimez
Takuya Yoshioka
Huaming Wang
Xiaofei Wang
Zhuo Chen
Xuedong Huang
22
62
0
18 Oct 2021
VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge
Yougen Yuan
Zhiqiang Lv
Shen Huang
Pengfei Hu
9
0
0
18 Oct 2021
A Unified Speaker Adaptation Approach for ASR
Yingzhu Zhao
Chongjia Ni
C. Leung
Shafiq R. Joty
Chng Eng Siong
B. Ma
CLL
92
9
0
16 Oct 2021
StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data
Victor Pellegrain
Myriam Tami
M. Batteux
C´eline Hudelot
AI4TS
20
2
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
47
60
0
15 Oct 2021
Attention-Free Keyword Spotting
Mashrur M. Morshed
Ahmad Omar Ahsan
30
9
0
14 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
12
92
0
14 Oct 2021
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
Fan Yu
Shiliang Zhang
Yihui Fu
Lei Xie
Siqi Zheng
...
Pengcheng Guo
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
8
104
0
14 Oct 2021
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Sangeeta Srivastava
Yun Wang
Andros Tjandra
Anurag Kumar
Chunxi Liu
Kritika Singh
Yatharth Saraf
SSL
30
24
0
14 Oct 2021
Previous
1
2
3
...
29
30
31
...
33
34
35
Next