ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,744 papers shown
Title
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
Khanh Le
Tuan Vu Ho
Dung Tran
Duc Thanh Chau
48
0
0
20 Feb 2025
RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
Ching Hua Lee
Chouchang Yang
Jaejin Cho
Yashas Malur Saidutta
R. S. Srinivasa
Yilin Shen
Hongxia Jin
DiffM
83
0
0
19 Feb 2025
Keep what you need : extracting efficient subnetworks from large audio representation models
Keep what you need : extracting efficient subnetworks from large audio representation models
David Genova
P. Esling
Tom Hurlin
73
0
0
18 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Soundwave: Less is More for Speech-Text Alignment in LLMs
Y. Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
H. Li
AuLLM
SyDa
VLM
98
0
0
18 Feb 2025
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
Yifan Liang
Fangkun Liu
Andong Li
Xiaodong Li
C. Zheng
39
1
0
17 Feb 2025
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
ChordFormer: A Conformer-Based Architecture for Large-Vocabulary Audio Chord Recognition
Muhammad Waseem Akram
Stefano Dettori
V. Colla
Giorgio Buttazzo
52
0
0
17 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
51
0
0
17 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang
Shujie Liu
Lingwei Meng
J. Li
Yifan Yang
...
Yanqing Liu
Haoqin Sun
Jiaming Zhou
Yan Lu
Yong Qin
48
0
0
16 Feb 2025
Improving action segmentation via explicit similarity measurement
Improving action segmentation via explicit similarity measurement
Kamel Aouaidjia
Wenhao Zhang
Aofan Li
Chongsheng Zhang
39
0
0
15 Feb 2025
When, Where and Why to Average Weights?
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
91
0
0
10 Feb 2025
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers
Adam Stooke
Rohit Prabhavalkar
K. Sim
P. M. Mengibar
31
0
0
06 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
67
0
0
05 Feb 2025
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Christopher Simic
K. Riedhammer
Tobias Bocklet
91
0
0
03 Feb 2025
Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language
Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language
Turi Abu
Ying Shi
T. Zheng
D. Wang
55
0
0
01 Feb 2025
Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition
Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition
Anna Seo Gyeong Choi
Jonghyeon Park
Myungwoo Oh
41
0
0
01 Feb 2025
Privacy-Preserving Edge Speech Understanding with Tiny Foundation Models
Privacy-Preserving Edge Speech Understanding with Tiny Foundation Models
A. Benazir
Felix Xiaozhu Lin
41
0
0
29 Jan 2025
Enhancing and Exploring Mild Cognitive Impairment Detection with W2V-BERT-2.0
Yueguan Wang
Tatsunari Matsushima
Soichiro Matsushima
Toshimitsu Sakai
31
0
0
28 Jan 2025
Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Igor Abramovski
Alon Vinnikov
Shalev Shaer
Naoyuki Kanda
Xiaofei Wang
Amir Ivry
Eyal Krupka
34
0
0
28 Jan 2025
Optimized Self-supervised Training with BEST-RQ for Speech Recognition
Ilja Baumann
Dominik Wagner
K. Riedhammer
Tobias Bocklet
67
0
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Yao Hu
69
4
0
24 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
Zhaofeng Lin
Naomi Harte
78
1
0
20 Jan 2025
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
Jiaming Zhou
S. Zhao
Hui Wang
Tian-Hao Zhang
Haoqin Sun
Xuechen Wang
Yong Qin
161
3
0
20 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis A. Lastras
59
0
0
15 Jan 2025
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Anurag Kumar
Rohit Paturi
Amber Afshan
S. Srinivasan
41
0
0
14 Jan 2025
Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
Minu Kim
Kangwook Jang
Hoirin Kim
34
0
0
12 Jan 2025
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
44
0
0
10 Jan 2025
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Wei Liu
Jingyong Hou
Dong Yang
Muyong Cao
Tan Lee
70
1
0
10 Jan 2025
On Creating A Brain-To-Text Decoder
On Creating A Brain-To-Text Decoder
Zenon Lamprou
Yashar Moshfeghi
31
0
0
10 Jan 2025
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Yosuke Higuchi
Tetsuji Ogawa
Tetsunori Kobayashi
AuLLM
40
0
0
08 Jan 2025
Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor Environments
Hanbin Bae
Byungjun Kang
Jiwon Kim
Jaeyong Hwang
Hosang Sung
Hoon-Young Cho
3DV
28
0
0
06 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
29
0
0
04 Jan 2025
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Rui Liu
Hongyu Yuan
H. Li
40
0
0
03 Jan 2025
On the Robustness of Cover Version Identification Models: A Study Using Cover Versions from YouTube
Simon Hachmeier
Robert Jäschke
AAML
38
0
0
03 Jan 2025
FAST: Fast Audio Spectrogram Transformer
Anugunj Naman
Gaibo Zhang
26
0
0
03 Jan 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
52
3
0
03 Jan 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
47
0
0
31 Dec 2024
Unity is Strength: Unifying Convolutional and Transformeral Features for
  Better Person Re-Identification
Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification
Yuhao Wang
Pingping Zhang
Xuehu Liu
Zhengzheng Tu
Huchuan Lu
42
3
0
23 Dec 2024
LAMA-UT: Language Agnostic Multilingual ASR through Orthography
  Unification and Language-Specific Transliteration
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
Sangmin Lee
Woo-Jin Chung Hong-Goo Kang
Hong-Goo Kang
70
0
0
19 Dec 2024
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with
  MxDNA
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Xinzhu Ma
Nanqing Dong
W. Ouyang
73
2
0
18 Dec 2024
A Decade of Deep Learning: A Survey on The Magnificent Seven
A Decade of Deep Learning: A Survey on The Magnificent Seven
Dilshod Azizov
Muhammad Arslan Manzoor
Velibor Bojkovic
Yingxu Wang
Z. Wang
...
Liang Li
Siwei Liu
Yu Zhong
Wei Liu
Shangsong Liang
OOD
AI4TS
MedIm
116
0
0
13 Dec 2024
Mining Word Boundaries from Speech-Text Parallel Data for Cross-domain
  Chinese Word Segmentation
Mining Word Boundaries from Speech-Text Parallel Data for Cross-domain Chinese Word Segmentation
Xuebin Wang
Lei Zhang
Z. Li
Shilin Zhou
Chen Gong
Yang Hou
65
0
0
12 Dec 2024
Effective Text Adaptation for LLM-based ASR through Soft Prompt
  Fine-Tuning
Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning
Yingyi Ma
Zhe Liu
Ozlem Kalinli
70
0
0
09 Dec 2024
FERERO: A Flexible Framework for Preference-Guided Multi-Objective
  Learning
FERERO: A Flexible Framework for Preference-Guided Multi-Objective Learning
Lisha Chen
A. F. M. Saif
Yanning Shen
Tianyi Chen
71
2
0
02 Dec 2024
AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric
  Depth Estimation
AVS-Net: Audio-Visual Scale Net for Self-supervised Monocular Metric Depth Estimation
Xiaohu Liu
Sascha Hornauer
Fabien Moutarde
Jialiang Lu
SSL
MDE
56
0
0
02 Dec 2024
Complexity boosted adaptive training for better low resource ASR
  performance
Complexity boosted adaptive training for better low resource ASR performance
Hongxuan Lu
Shenjian Wang
Biao Li
62
0
0
01 Dec 2024
From Audio Deepfake Detection to AI-Generated Music Detection -- A
  Pathway and Overview
From Audio Deepfake Detection to AI-Generated Music Detection -- A Pathway and Overview
Yupei Li
M. Milling
Lucia Specia
Björn Schuller
89
6
0
30 Nov 2024
PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI
  Inference Servers
PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers
Gwangoo Yeo
Jiin Kim
Yujeong Choi
Minsoo Rhu
74
0
0
28 Nov 2024
Continual Learning in Machine Speech Chain Using Gradient Episodic
  Memory
Continual Learning in Machine Speech Chain Using Gradient Episodic Memory
Geoffrey Tyndall
Kurniawati Azizah
Dipta Tanaya
Ayu Purwarianti
Dessi Lestari
S. Sakti
CLL
60
0
0
27 Nov 2024
Tiny-Align: Bridging Automatic Speech Recognition and Large Language
  Model on the Edge
Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge
Ruiyang Qin
Dancheng Liu
Gelei Xu
Zheyu Yan
Chenhui Xu
Yuting Hu
Xiaolin Hu
Jinjun Xiong
Yiyu Shi
AuLLM
104
1
0
21 Nov 2024
Previous
12345...333435
Next