ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,744 papers shown
Title
Study of positional encoding approaches for Audio Spectrogram
  Transformers
Study of positional encoding approaches for Audio Spectrogram Transformers
L. Pepino
Pablo Riera
Luciana Ferrer
ViT
12
6
0
13 Oct 2021
On Language Model Integration for RNN Transducer based Speech
  Recognition
On Language Model Integration for RNN Transducer based Speech Recognition
Wei Zhou
Zuoyun Zheng
Ralf Schluter
Hermann Ney
24
22
0
13 Oct 2021
A Melody-Unsupervision Model for Singing Voice Synthesis
A Melody-Unsupervision Model for Singing Voice Synthesis
Soonbeom Choi
Juhan Nam
21
13
0
13 Oct 2021
All-neural beamformer for continuous speech separation
All-neural beamformer for continuous speech separation
Zhuohuang Zhang
Takuya Yoshioka
Naoyuki Kanda
Zhuo Chen
Xiaofei Wang
Dongmei Wang
Sefik Emre Eskimez
25
15
0
13 Oct 2021
Speech Summarization using Restricted Self-Attention
Speech Summarization using Restricted Self-Attention
Roshan S. Sharma
Shruti Palaskar
A. Black
Florian Metze
17
33
0
12 Oct 2021
Multi-Modal Pre-Training for Automated Speech Recognition
Multi-Modal Pre-Training for Automated Speech Recognition
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
22
16
0
12 Oct 2021
VarArray: Array-Geometry-Agnostic Continuous Speech Separation
VarArray: Array-Geometry-Agnostic Continuous Speech Separation
Takuya Yoshioka
Xiaofei Wang
Dongmei Wang
M. Tang
Zirun Zhu
Zhuo Chen
Naoyuki Kanda
13
37
0
12 Oct 2021
LightSeq2: Accelerated Training for Transformer-based Models on GPUs
LightSeq2: Accelerated Training for Transformer-based Models on GPUs
Xiaohui Wang
Yang Wei
Ying Xiong
Guyue Huang
Xian Qian
Yufei Ding
Mingxuan Wang
Lei Li
VLM
6
29
0
12 Oct 2021
Partial Variable Training for Efficient On-Device Federated Learning
Partial Variable Training for Efficient On-Device Federated Learning
Tien-Ju Yang
Dhruv Guliani
F. Beaufays
Giovanni Motta
FedML
11
25
0
11 Oct 2021
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
Jing Pan
Tao Lei
Kwangyoun Kim
Kyu Jeong Han
Shinji Watanabe
VLM
26
9
0
11 Oct 2021
Interactive Feature Fusion for End-to-End Noise-Robust Speech
  Recognition
Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition
Yuchen Hu
Nana Hou
Chen Chen
Chng Eng Siong
11
39
0
11 Oct 2021
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text
  Generation
A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation
Yosuke Higuchi
Nanxin Chen
Yuya Fujita
H. Inaguma
Tatsuya Komatsu
Jaesong Lee
Jumon Nozaki
Tianzi Wang
Shinji Watanabe
22
41
0
11 Oct 2021
Advancing Momentum Pseudo-Labeling with Conformer and Initialization
  Strategy
Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy
Yosuke Higuchi
Niko Moritz
Jonathan Le Roux
Takaaki Hori
11
11
0
11 Oct 2021
Have best of both worlds: two-pass hybrid and E2E cascading framework
  for speech recognition
Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition
Guoli Ye
V. Mazalov
Jinyu Li
Y. Gong
12
9
0
10 Oct 2021
Universal Paralinguistic Speech Representations Using Self-Supervised
  Conformers
Universal Paralinguistic Speech Representations Using Self-Supervised Conformers
Joel Shor
A. Jansen
Wei Han
Daniel S. Park
Yu Zhang
SSL
AI4TS
33
54
0
09 Oct 2021
An Exploration of Self-Supervised Pretrained Representations for
  End-to-End Speech Recognition
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSL
AI4TS
16
81
0
09 Oct 2021
Data Augmentation with Locally-time Reversed Speech for Automatic Speech
  Recognition
Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition
Si-Ioi Ng
Tan Lee
17
2
0
09 Oct 2021
TitaNet: Neural Model for speaker representation with 1D Depth-wise
  separable convolutions and global context
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Nithin Rao Koluguri
Taejin Park
Boris Ginsburg
ViT
20
92
0
08 Oct 2021
Hybrid Random Features
Hybrid Random Features
K. Choromanski
Haoxian Chen
Han Lin
Yuanzhe Ma
Arijit Sehanobish
...
Andy Zeng
Valerii Likhosherstov
Dmitry Kalashnikov
Vikas Sindhwani
Adrian Weller
12
21
0
08 Oct 2021
Exploring Heterogeneous Characteristics of Layers in ASR Models for More
  Efficient Training
Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training
Lillian Zhou
Dhruv Guliani
Andreas Kabel
Giovanni Motta
F. Beaufays
10
1
0
08 Oct 2021
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
  Subword Units
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units
Yosuke Higuchi
Keita Karube
Tetsuji Ogawa
Tetsunori Kobayashi
13
22
0
08 Oct 2021
Improving Pseudo-label Training For End-to-end Speech Recognition Using
  Gradient Mask
Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Shaoshi Ling
Chen Shen
Meng Cai
Zejun Ma
VLM
SSL
22
8
0
08 Oct 2021
Input Length Matters: Improving RNN-T and MWER Training for Long-form
  Telephony Speech Recognition
Input Length Matters: Improving RNN-T and MWER Training for Long-form Telephony Speech Recognition
Zhiyun Lu
Yanwei Pan
Thibault Doutre
Parisa Haghani
Liangliang Cao
Rohit Prabhavalkar
C. Zhang
Trevor Strohman
AuLLM
72
14
0
08 Oct 2021
Streaming Transformer Transducer Based Speech Recognition Using
  Non-Causal Convolution
Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution
Yangyang Shi
Chunyang Wu
Dilin Wang
Alex Xiao
Jay Mahadeokar
...
Ke Li
Yuan Shangguan
Varun K. Nagaraja
Ozlem Kalinli
M. Seltzer
20
15
0
07 Oct 2021
Predictive Maintenance for General Aviation Using Convolutional
  Transformers
Predictive Maintenance for General Aviation Using Convolutional Transformers
Hong Yang
Aidan P. LaBella
Travis J. Desell
AI4TS
23
5
0
07 Oct 2021
Enabling On-Device Training of Speech Recognition Models with Federated
  Dropout
Enabling On-Device Training of Speech Recognition Models with Federated Dropout
Dhruv Guliani
Lillian Zhou
Changwan Ryu
Tien-Ju Yang
Harry Zhang
Yong Xiao
F. Beaufays
Giovanni Motta
FedML
17
16
0
07 Oct 2021
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech
  Recognition
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
Binbin Zhang
Hang Lv
Pengcheng Guo
Qijie Shao
Chao Yang
...
Hui Bu
Xiaoyu Chen
Chenchen Zeng
Di Wu
Zhendong Peng
17
217
0
07 Oct 2021
Cloning one's voice using very limited data in the wild
Cloning one's voice using very limited data in the wild
Dongyang Dai
Yuan-Jui Chen
Li Chen
Ming Tu
Lu Liu
Rui Xia
Qiao Tian
Yuping Wang
Yuxuan Wang
SyDa
17
9
0
07 Oct 2021
Back from the future: bidirectional CTC decoding using future
  information in speech recognition
Back from the future: bidirectional CTC decoding using future information in speech recognition
Namkyu Jung
Geon-min Kim
Han-Gyu Kim
23
3
0
07 Oct 2021
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number
  of Speakers using End-to-End Speaker-Attributed ASR
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR
Naoyuki Kanda
Xiong Xiao
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
19
34
0
07 Oct 2021
CTC Variations Through New WFST Topologies
CTC Variations Through New WFST Topologies
A. Laptev
Somshubra Majumdar
Boris Ginsburg
27
20
0
06 Oct 2021
Spell my name: keyword boosted speech recognition
Spell my name: keyword boosted speech recognition
Namkyu Jung
Geon-min Kim
Joon Son Chung
38
13
0
06 Oct 2021
Language Modeling using LMUs: 10x Better Data Efficiency or Improved
  Scaling Compared to Transformers
Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers
Narsimha Chilkuri
Eric Hunsberger
Aaron R. Voelker
G. Malik
C. Eliasmith
30
7
0
05 Oct 2021
S2 Reducer: High-Performance Sparse Communication to Accelerate
  Distributed Deep Learning
S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning
Ke-shi Ge
Yongquan Fu
Zhiquan Lai
Xiaoge Deng
Dongsheng Li
15
2
0
05 Oct 2021
Sound Event Detection Transformer: An Event-based End-to-End Model for
  Sound Event Detection
Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection
Zhi-qin Ye
Xiangdong Wang
Hong Liu
Yueliang Qian
Ruijie Tao
Long Yan
Kazushige Ouchi
ViT
27
15
0
05 Oct 2021
ASR Rescoring and Confidence Estimation with ELECTRA
ASR Rescoring and Confidence Estimation with ELECTRA
Hayato Futami
H. Inaguma
Masato Mimura
S. Sakai
Tatsuya Kawahara
KELM
54
20
0
05 Oct 2021
Fast Contextual Adaptation with Neural Associative Memory for On-Device
  Personalized Speech Recognition
Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition
Tsendsuren Munkhdalai
K. Sim
Angad Chandorkar
Fan Gao
Mason Chua
Trevor Strohman
F. Beaufays
29
34
0
05 Oct 2021
Towards efficient end-to-end speech recognition with
  biologically-inspired neural networks
Towards efficient end-to-end speech recognition with biologically-inspired neural networks
Thomas Bohnstingl
Ayush Garg
Stanislaw Wo'zniak
G. Saon
E. Eleftheriou
A. Pantazi
16
5
0
04 Oct 2021
Speech Technology for Everyone: Automatic Speech Recognition for
  Non-Native English with Transfer Learning
Speech Technology for Everyone: Automatic Speech Recognition for Non-Native English with Transfer Learning
Toshiko Shibano
Xinyi Zhang
Miao Li
Haejin Cho
Peter Sullivan
Muhammad Abdul-Mageed
VLM
36
17
0
01 Oct 2021
Large-scale ASR Domain Adaptation using Self- and Semi-supervised
  Learning
Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning
DongSeon Hwang
Ananya Misra
Zhouyuan Huo
Nikhil Siddhartha
Shefali Garg
David Qiu
K. Sim
Trevor Strohman
F. Beaufays
Yanzhang He
55
34
0
01 Oct 2021
Incremental Layer-wise Self-Supervised Learning for Efficient Speech
  Domain Adaptation On Device
Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device
Zhouyuan Huo
Dong-Gyo Hwang
K. Sim
Shefali Garg
Ananya Misra
Nikhil Siddhartha
Trevor Strohman
Franccoise Beaufays
46
7
0
01 Oct 2021
SpliceOut: A Simple and Efficient Audio Augmentation Method
SpliceOut: A Simple and Efficient Audio Augmentation Method
Arjit Jain
Pranay Reddy Samala
Deepak Mittal
P. Jyothi
M. Singh
20
10
0
30 Sep 2021
Multi Scale Graph Wavenet for Wind Speed Forecasting
Multi Scale Graph Wavenet for Wind Speed Forecasting
Neetesh Rathore
Pradeep Rathore
Arghya Basak
S. Nistala
Venkataramana Runkana
AI4TS
69
18
0
30 Sep 2021
FastCorrect 2: Fast Error Correction on Multiple Candidates for
  Automatic Speech Recognition
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Yichong Leng
Xu Tan
Rui Wang
Linchen Zhu
Jin Xu
...
Linquan Liu
Tao Qin
Xiang-Yang Li
Ed Lin
Tie-Yan Liu
27
40
0
29 Sep 2021
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning
  for Automatic Speech Recognition
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
Daniel S. Park
Wei Han
James Qin
Anmol Gulati
...
Zhifeng Chen
Quoc V. Le
Chung-Cheng Chiu
Ruoming Pang
Yonghui Wu
SSL
19
175
0
27 Sep 2021
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
  Non-Autoregressive Hidden Intermediates
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
H. Inaguma
Siddharth Dalmia
Brian Yan
Shinji Watanabe
57
11
0
27 Sep 2021
ChannelAugment: Improving generalization of multi-channel ASR by
  training with input channel randomization
ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization
M. Gaudesi
F. Weninger
D. Sharma
P. Zhan
AAML
19
1
0
23 Sep 2021
Audiomer: A Convolutional Transformer For Keyword Spotting
Surya Kant Sahu
Sai Mitheran
Juhi Kamdar
Meet Gandhi
32
8
0
21 Sep 2021
Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Audio-Visual Speech Recognition is Worth 32×\times×32×\times×8 Voxels
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
21
7
0
20 Sep 2021
Influence of ASR and Language Model on Alzheimer's Disease Detection
Influence of ASR and Language Model on Alzheimer's Disease Detection
Joan Codina-Filbà
Guillermo Cámbara
Jordi Luque
Mireia Farrús
11
2
0
20 Sep 2021
Previous
123...303132333435
Next