Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,744 papers shown
Title
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Taejin Park
Ivan Medennikov
Kunal Dhawan
Weiqing Wang
He Huang
Nithin Rao Koluguri
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
35
2
0
10 Sep 2024
An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition
Yi-Cheng Wang
Li-Ting Pai
Bi-Cheng Yan
Hsin-Wei Wang
Chi-Han Lin
Berlin Chen
15
1
0
10 Sep 2024
SpeechTaxi: On Multilingual Semantic Speech Classification
Lennart Keller
Goran Glavaš
26
0
0
10 Sep 2024
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion
Wei Chen
Xintao Zhao
Jun Chen
Binzhu Sha
Zhiwei Lin
Zhiyong Wu
37
0
0
10 Sep 2024
Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings
Sakshi Deo Shukla
Pavel Denisov
Tuğtekin Turan
18
0
0
10 Sep 2024
PDAF: A Phonetic Debiasing Attention Framework For Speaker Verification
Massa Baali
Abdulhamid Aldoobi
Hira Dhamyal
Rita Singh
Bhiksha Raj
26
0
0
09 Sep 2024
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Nithin Rao Koluguri
Travis M. Bartley
Hainan Xu
Oleksii Hrinchuk
Jagadeesh Balam
Boris Ginsburg
Georg Kucsko
32
2
0
09 Sep 2024
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Hongfei Xue
Rong Gong
Mingchen Shao
Xin Xu
L. xilinx Wang
...
Yong Qin
Jun Du
Ming Li
Binbin Zhang
Bin Jia
21
1
0
09 Sep 2024
Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Zhengyang Chen
Bing Han
Shuai Wang
Yidi Jiang
Yanmin Qian
43
0
0
07 Sep 2024
Mel-RoFormer for Vocal Separation and Vocal Melody Transcription
Ju-Chiang Wang
Wei-Tsung Lu
Jitong Chen
16
1
0
07 Sep 2024
SLiCK: Exploiting Subsequences for Length-Constrained Keyword Spotting
K. Nishu
Minsik Cho
Devang Naik
11
0
0
06 Sep 2024
Lightweight Transducer Based on Frame-Level Criterion
Genshun Wan
Mengzhi Wang
Tingzhi Mao
Hang Chen
Z. Ye
36
1
0
05 Sep 2024
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
Bang Zeng
Ming Li
29
2
0
04 Sep 2024
An Analysis of Linear Complexity Attention Substitutes with BEST-RQ
Ryan Whetten
Titouan Parcollet
Adel Moumen
Marco Dinarelli
Yannick Esteve
22
0
0
04 Sep 2024
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
Hukai Huang
Jiayan Lin
K. Wang
Yishuang Li
Wenhao Guan
Lin Li
Q. Hong
MoE
29
0
0
03 Sep 2024
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Shutong Niu
Ruoyu Wang
Jun Du
Gaobin Yang
Yanhui Tu
...
Tian Gao
Genshun Wan
Feng Ma
Jia Pan
Jianqing Gao
34
4
0
03 Sep 2024
Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training
Wenhan Yao
Zedong Xing
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
21
0
0
03 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna C. Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
28
1
0
02 Sep 2024
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Zengrui Jin
Yifan Yang
Mohan Shi
Wei Kang
Xiaoyu Yang
...
Lingwei Meng
Long Lin
Yong Xu
Shi-Xiong Zhang
Daniel Povey
28
1
0
01 Sep 2024
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Hao Shi
Yuan Gao
Zhaoheng Ni
Tatsuya Kawahara
30
1
0
01 Sep 2024
Comparing Discrete and Continuous Space LLMs for Speech Recognition
Yaoxun Xu
Shi-Xiong Zhang
Jianwei Yu
Zhiyong Wu
Dong Yu
AuLLM
17
3
0
01 Sep 2024
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
Xinyu Wang
Qian Wang
Haolin Huang
Yu Fang
Mengjie Xu
Qian Wang
26
0
0
31 Aug 2024
ProGRes: Prompted Generative Rescoring on ASR n-Best
Ada Defne Tur
Adel Moumen
Mirco Ravanelli
26
1
0
30 Aug 2024
Advancing Multi-talker ASR Performance with Large Language Models
Mohan Shi
Zengrui Jin
Yaoxun Xu
Yong Xu
Shi-Xiong Zhang
Kun Wei
Yiwen Shao
Chunlei Zhang
Dong Yu
29
0
0
30 Aug 2024
Speaker Tagging Correction With Non-Autoregressive Language Models
Grigor Kirakosyan
Davit Karamyan
3DV
26
0
0
30 Aug 2024
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
Lun Wang
24
0
0
29 Aug 2024
BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding
Jinzhao Zhou
Yiqun Duan
Fred Chang
T. Do
Yu-Kai Wang
Chin-Teng Lin
22
2
0
28 Aug 2024
Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Wei Chen
Zhiyuan Li
Shuo Xin
Yihao Wang
29
4
0
28 Aug 2024
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection
Xuanru Zhou
Anshul Kashyap
Steve Li
Ayati Sharma
Brittany Morin
...
Z. Ezzes
Zachary Miller
M. G. Tempini
Jiachen Lian
Gopala Krishna Anumanchipalli
16
6
0
27 Aug 2024
The VoxCeleb Speaker Recognition Challenge: A Retrospective
Jaesung Huh
Joon Son Chung
Arsha Nagrani
A. Brown
Jee-weon Jung
Daniel Garcia-Romero
Andrew Zisserman
36
3
0
27 Aug 2024
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Md Awsafur Rahman
Zaber Ibn Abdul Hakim
Najibul Haque Sarker
Bishmoy Paul
S. Fattah
38
7
0
26 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna C. Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
28
1
0
23 Aug 2024
Energy Estimation of Last Mile Electric Vehicle Routes
André Snoeck
Aniruddha Bhargava
Daniel Merchan
Josiah Davis
Julian Pachon
26
0
0
21 Aug 2024
XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition
Xucheng Wan
Naijun Zheng
Kai Liu
Huan Zhou
22
0
0
20 Aug 2024
Federated Learning of Large ASR Models in the Real World
Yonghui Xiao
Yuxin Ding
Changwan Ryu
P. Zadrazil
Francoise Beaufays
AI4CE
31
0
0
19 Aug 2024
Parameter-Efficient Transfer Learning under Federated Learning for Automatic Speech Recognition
Xuan Kan
Yonghui Xiao
Tien-Ju Yang
Nanxin Chen
Rajiv Mathews
FedML
21
0
0
19 Aug 2024
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition
Yangze Li
Xiong Wang
Songjun Cao
Yike Zhang
Long Ma
Lei Xie
AuLLM
56
0
0
18 Aug 2024
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Pengfei Cai
Yan Song
Kang Li
Haoyu Song
Ian Mcloughlin
28
5
0
16 Aug 2024
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval
Lifeng Zhou
Yuke Li
Rui Deng
Yuting Yang
Haoqi Zhu
21
0
0
15 Aug 2024
Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm for Speech Enhancement
Tao Zheng
Liejun Wang
Yinfeng Yu
26
1
0
13 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
27
4
0
12 Aug 2024
Sampling Foundational Transformer: A Theoretical Perspective
Viet Anh Nguyen
Minh Lenhat
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
42
0
0
11 Aug 2024
LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition
Eunseop Yoon
Hee Suk Yoon
John Harvill
M. Hasegawa-Johnson
Chang D. Yoo
TTA
VLM
23
0
0
11 Aug 2024
SAMSA: Efficient Transformer for Many Data Modalities
Minh Lenhat
Viet Anh Nguyen
Khoa Nguyen
Duong Duc Hieu
Dao Huu Hung
Truong Son-Hy
46
0
0
10 Aug 2024
SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
Da Mu
Zhicheng Zhang
Haobo Yue
Zehao Wang
Jin Tang
Jianqin Yin
Mamba
38
1
0
09 Aug 2024
Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement
Muhammad Salman Khan
Moreno La Quatra
Kuo-Hsuan Hung
Szu-Wei Fu
Sabato Marco Siniscalchi
Yu Tsao
23
2
0
08 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
33
0
0
08 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
35
2
0
08 Aug 2024
HydraFormer: One Encoder For All Subsampling Rates
Yaoxun Xu
Xingchen Song
Zhiyong Wu
Di Wu
Zhendong Peng
Binbin Zhang
23
0
0
08 Aug 2024
Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings
Jinzhao Zhou
Yiqun Duan
Ziyi Zhao
Yu-Cheng Chang
Yu-Kai Wang
T. Do
Chin-Teng Lin
34
1
0
08 Aug 2024
Previous
1
2
3
4
5
6
...
33
34
35
Next