Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08100
Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition
16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Conformer: Convolution-augmented Transformer for Speech Recognition"
50 / 1,744 papers shown
Title
Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients
Jinsheng Yuan
Yuhang Hao
Weisi Guo
Yun Wu
Chongyan Gu
AAML
FedML
16
0
0
09 May 2025
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Wataru Nakata
Yuma Koizumi
Shigeki Karita
Robin Scheibler
Haruko Ishikawa
Adriana Guevara-Rukoz
Heiga Zen
M. Bacchiani
41
0
0
08 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
49
0
0
07 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
78
0
0
06 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
32
0
0
01 May 2025
Pretraining Large Brain Language Model for Active BCI: Silent Speech
Jinzhao Zhou
Zehong Cao
Yiqun Duan
Connor Barkley
Daniel Leong
...
Ziyi Zhao
T. Do
Yu-Cheng Chang
Sheng-Fu Liang
Chin-Teng Lin
32
0
0
29 Apr 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
48
0
0
29 Apr 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
47
1
0
28 Apr 2025
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
29
0
0
25 Apr 2025
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
31
0
0
17 Apr 2025
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning
Mahmoud Salhab
Marwan Elghitany
Shameed Sait
Syed Sibghat Ullah
Mohammad Abusheikh
Hasan Abusheikh
44
0
0
16 Apr 2025
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
Heitor R. Guimarães
Jiaqi Su
Rithesh Kumar
Tiago H. Falk
Zeyu Jin
DiffM
30
2
0
13 Apr 2025
Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD
Byunggun Kim
Younghun Kwon
MedIm
12
0
0
12 Apr 2025
Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation
Davide Berghi
Philip J. B. Jackson
29
0
0
11 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
39
0
0
10 Apr 2025
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
Beilong Tang
Bang Zeng
Ming Li
AI4TS
34
0
0
10 Apr 2025
Visual-Aware Speech Recognition for Noisy Scenarios
Lakshmipathi Balaji
Karan Singla
26
0
0
09 Apr 2025
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
Vladimir Bataev
30
0
0
09 Apr 2025
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
Wenyu Wang
Yiquan Zhou
Jihua Zhu
Hongwu Ding
Jiacheng Xu
Shihao Li
DRL
30
0
0
08 Apr 2025
Selective Masking Adversarial Attack on Automatic Speech Recognition Systems
Zheng Fang
Shenyi Zhang
Tao Wang
Bowen Li
Lingchen Zhao
Zhangyi Wang
AAML
18
0
0
06 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao W. Wang
Songruoyao Wu
Jiaxing Yu
K. Zhang
MGen
VGen
65
1
0
01 Apr 2025
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
48
1
0
01 Apr 2025
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems
Weifei Jin
Yuxin Cao
Junjie Su
Derui Wang
Yedi Zhang
Minhui Xue
Jie Hao
Jin Song Dong
Yixian Yang
AAML
50
0
0
01 Apr 2025
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Xabier de Zuazo
Eva Navas
Ibon Saratxaga
Inma Hernáez Rioja
37
0
0
30 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
93
0
0
26 Mar 2025
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
Max W. Y. Lam
Yijin Xing
Weiya You
Jingcheng Wu
Zongyu Yin
...
T. Zhao
Chien-Hung Liu
Xuchen Song
Yang Li
Yahui Zhou
LRM
56
2
0
25 Mar 2025
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
Yufeng Yang
H. Taherian
Vahid Ahmadi Kalkhorani
DeLiang Wang
39
0
0
23 Mar 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
48
1
0
21 Mar 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
Yang Chen
Hui Wang
Shiyao Wang
J. Chen
Jiabei He
Jiaming Zhou
Xi Yang
Y. Wang
Yonghua Lin
Yong Qin
36
0
0
20 Mar 2025
Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
Korbinian Kuhn
Verena Kersken
Gottfried Zimmermann
48
0
0
19 Mar 2025
Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition
Korbinian Kuhn
Verena Kersken
Gottfried Zimmermann
35
0
0
19 Mar 2025
Shushing! Let's Imagine an Authentic Speech from the Silent Video
Jiaxin Ye
Hongming Shan
DiffM
VGen
63
1
0
19 Mar 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
51
0
0
14 Mar 2025
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
Jiaming Zhou
S. Zhao
Jiabei He
Hui Wang
Wenjia Zeng
Yong Chen
Haoqin Sun
Aobo Kong
Yong Qin
55
1
0
13 Mar 2025
Lend a Hand: Semi Training-Free Cued Speech Recognition via MLLM-Driven Hand Modeling for Barrier-free Communication
Guanjie Huang
Danny Hin Kwok Tsang
Li Liu
34
0
0
11 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
59
0
0
11 Mar 2025
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
Michael McGuire
47
0
0
10 Mar 2025
Building English ASR model with regional language support
Purvi Agrawal
Vikas Joshi
Bharati Patidar
Ankur Gupta
R. Mehta
36
0
0
10 Mar 2025
Linguistic Knowledge Transfer Learning for Speech Enhancement
Kuo-Hsuan Hung
Xugang Lu
Szu-Wei Fu
H. Tseng
Hsin-Yi Lin
Chii-Wann Lin
Yu Tsao
VLM
65
0
0
10 Mar 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
Yifan Liu
Yu Fang
Zhouhan Lin
33
0
0
07 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
23
0
03 Mar 2025
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
40
1
0
28 Feb 2025
PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech Enhancement
Zizhen Lin
Junyu Wang
Ruili Li
Fei Shen
Xi Xuan
64
0
0
27 Feb 2025
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Keisuke Kamahori
Jungo Kasai
Noriyuki Kojima
Baris Kasikci
32
0
0
27 Feb 2025
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
Jiaming Zhou
Yujie Guo
S. Zhao
Haoqin Sun
Hui Wang
...
Shiyao Wang
Xi Yang
Y. Wang
Yonghua Lin
Yong Qin
46
0
0
26 Feb 2025
Silent Speech Sentence Recognition with Six-Axis Accelerometers using Conformer and CTC Algorithm
Yudong Xie
Zhifeng Han
Qinfan Xiao
Liwei Liang
Lu-Qi Tao
Tian-Ling Ren
71
0
0
25 Feb 2025
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Z. Li
Yu-Hu Li
38
0
0
25 Feb 2025
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoMe
VLM
107
0
0
24 Feb 2025
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Khanh Le
Duc Thanh Chau
AI4TS
55
0
0
24 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
91
0
0
21 Feb 2025
1
2
3
4
...
33
34
35
Next