ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,744 papers shown
Title
Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients
Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients
Jinsheng Yuan
Yuhang Hao
Weisi Guo
Yun Wu
Chongyan Gu
AAML
FedML
16
0
0
09 May 2025
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Wataru Nakata
Yuma Koizumi
Shigeki Karita
Robin Scheibler
Haruko Ishikawa
Adriana Guevara-Rukoz
Heiga Zen
M. Bacchiani
41
0
0
08 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
49
0
0
07 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
78
0
0
06 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
32
0
0
01 May 2025
Pretraining Large Brain Language Model for Active BCI: Silent Speech
Pretraining Large Brain Language Model for Active BCI: Silent Speech
Jinzhao Zhou
Zehong Cao
Yiqun Duan
Connor Barkley
Daniel Leong
...
Ziyi Zhao
T. Do
Yu-Cheng Chang
Sheng-Fu Liang
Chin-Teng Lin
32
0
0
29 Apr 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
48
0
0
29 Apr 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
47
1
0
28 Apr 2025
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
29
0
0
25 Apr 2025
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
31
0
0
17 Apr 2025
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning
Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning
Mahmoud Salhab
Marwan Elghitany
Shameed Sait
Syed Sibghat Ullah
Mohammad Abusheikh
Hasan Abusheikh
44
0
0
16 Apr 2025
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
Heitor R. Guimarães
Jiaqi Su
Rithesh Kumar
Tiago H. Falk
Zeyu Jin
DiffM
30
2
0
13 Apr 2025
Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD
Local Temporal Feature Enhanced Transformer with ROI-rank Based Masking for Diagnosis of ADHD
Byunggun Kim
Younghun Kwon
MedIm
12
0
0
12 Apr 2025
Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation
Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation
Davide Berghi
Philip J. B. Jackson
29
0
0
11 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
39
0
0
10 Apr 2025
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
Beilong Tang
Bang Zeng
Ming Li
AI4TS
34
0
0
10 Apr 2025
Visual-Aware Speech Recognition for Noisy Scenarios
Visual-Aware Speech Recognition for Noisy Scenarios
Lakshmipathi Balaji
Karan Singla
26
0
0
09 Apr 2025
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
Vladimir Bataev
30
0
0
09 Apr 2025
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
AVENet: Disentangling Features by Approximating Average Features for Voice Conversion
Wenyu Wang
Yiquan Zhou
Jihua Zhu
Hongwu Ding
Jiacheng Xu
Shihao Li
DRL
30
0
0
08 Apr 2025
Selective Masking Adversarial Attack on Automatic Speech Recognition Systems
Selective Masking Adversarial Attack on Automatic Speech Recognition Systems
Zheng Fang
Shenyi Zhang
Tao Wang
Bowen Li
Lingchen Zhao
Zhangyi Wang
AAML
18
0
0
06 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao W. Wang
Songruoyao Wu
Jiaxing Yu
K. Zhang
MGen
VGen
65
1
0
01 Apr 2025
Multi-Token Attention
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
48
1
0
01 Apr 2025
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems
Weifei Jin
Yuxin Cao
Junjie Su
Derui Wang
Yedi Zhang
Minhui Xue
Jie Hao
Jin Song Dong
Yixian Yang
AAML
50
0
0
01 Apr 2025
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages
Xabier de Zuazo
Eva Navas
Ibon Saratxaga
Inma Hernáez Rioja
37
0
0
30 Mar 2025
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages
Yangyang Meng
Jinpeng Li
Guodong Lin
Yu Pu
G. Wang
Hu Du
Zhiming Shao
Yukai Huang
Ke Li
Wei-Qiang Zhang
ObjD
93
0
0
26 Mar 2025
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
Max W. Y. Lam
Yijin Xing
Weiya You
Jingcheng Wu
Zongyu Yin
...
T. Zhao
Chien-Hung Liu
Xuchen Song
Yang Li
Yahui Zhou
LRM
56
2
0
25 Mar 2025
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
Yufeng Yang
H. Taherian
Vahid Ahmadi Kalkhorani
DeLiang Wang
39
0
0
23 Mar 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
48
1
0
21 Mar 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
Yang Chen
Hui Wang
Shiyao Wang
J. Chen
Jiabei He
Jiaming Zhou
Xi Yang
Y. Wang
Yonghua Lin
Yong Qin
36
0
0
20 Mar 2025
Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
Korbinian Kuhn
Verena Kersken
Gottfried Zimmermann
48
0
0
19 Mar 2025
Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition
Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition
Korbinian Kuhn
Verena Kersken
Gottfried Zimmermann
35
0
0
19 Mar 2025
Shushing! Let's Imagine an Authentic Speech from the Silent Video
Shushing! Let's Imagine an Authentic Speech from the Silent Video
Jiaxin Ye
Hongming Shan
DiffM
VGen
63
1
0
19 Mar 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
51
0
0
14 Mar 2025
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
Jiaming Zhou
S. Zhao
Jiabei He
Hui Wang
Wenjia Zeng
Yong Chen
Haoqin Sun
Aobo Kong
Yong Qin
55
1
0
13 Mar 2025
Lend a Hand: Semi Training-Free Cued Speech Recognition via MLLM-Driven Hand Modeling for Barrier-free Communication
Lend a Hand: Semi Training-Free Cued Speech Recognition via MLLM-Driven Hand Modeling for Barrier-free Communication
Guanjie Huang
Danny Hin Kwok Tsang
Li Liu
34
0
0
11 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
59
0
0
11 Mar 2025
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
Michael McGuire
47
0
0
10 Mar 2025
Building English ASR model with regional language support
Purvi Agrawal
Vikas Joshi
Bharati Patidar
Ankur Gupta
R. Mehta
36
0
0
10 Mar 2025
Linguistic Knowledge Transfer Learning for Speech Enhancement
Kuo-Hsuan Hung
Xugang Lu
Szu-Wei Fu
H. Tseng
Hsin-Yi Lin
Chii-Wann Lin
Yu Tsao
VLM
65
0
0
10 Mar 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
Yifan Liu
Yu Fang
Zhouhan Lin
33
0
0
07 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
23
0
03 Mar 2025
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
40
1
0
28 Feb 2025
PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech Enhancement
PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech Enhancement
Zizhen Lin
Junyu Wang
Ruili Li
Fei Shen
Xi Xuan
64
0
0
27 Feb 2025
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Keisuke Kamahori
Jungo Kasai
Noriyuki Kojima
Baris Kasikci
32
0
0
27 Feb 2025
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition
Jiaming Zhou
Yujie Guo
S. Zhao
Haoqin Sun
Hui Wang
...
Shiyao Wang
Xi Yang
Y. Wang
Yonghua Lin
Yong Qin
46
0
0
26 Feb 2025
Silent Speech Sentence Recognition with Six-Axis Accelerometers using Conformer and CTC Algorithm
Silent Speech Sentence Recognition with Six-Axis Accelerometers using Conformer and CTC Algorithm
Yudong Xie
Zhifeng Han
Qinfan Xiao
Liwei Liang
Lu-Qi Tao
Tian-Ling Ren
71
0
0
25 Feb 2025
Self-Adjust Softmax
Self-Adjust Softmax
Chuanyang Zheng
Yihang Gao
Guoxuan Chen
Han Shi
Jing Xiong
Xiaozhe Ren
Chao Huang
Xin Jiang
Z. Li
Yu-Hu Li
38
0
0
25 Feb 2025
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation
Qiuming Zhao
Guangzhi Sun
Chao Zhang
Mingxing Xu
Thomas Fang Zheng
MoMe
VLM
107
0
0
24 Feb 2025
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
Khanh Le
Duc Thanh Chau
AI4TS
55
0
0
24 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
91
0
0
21 Feb 2025
1234...333435
Next