ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,744 papers shown
Title
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
105
0
0
03 Oct 2024
Differentially Private Parameter-Efficient Fine-tuning for Large ASR
  Models
Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models
Hongbin Liu
Lun Wang
Om Thakkar
Abhradeep Thakurta
Arun Narayanan
26
0
0
02 Oct 2024
Frozen Large Language Models Can Perceive Paralinguistic Aspects of
  Speech
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Wonjune Kang
J. Jia
Chunyang Wu
Wei Zhou
Egor Lakomkin
...
Leda Sari
Suyoun Kim
Ke Li
Jay Mahadeokar
Ozlem Kalinli
AuLLM
29
2
0
02 Oct 2024
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation
  Model Training on EU Languages
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Marco Gaido
Sara Papi
L. Bentivogli
A. Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
34
0
0
01 Oct 2024
The Conformer Encoder May Reverse the Time Dimension
The Conformer Encoder May Reverse the Time Dimension
Robin Schmitt
Albert Zeyer
Mohammad Zeineldeen
Ralf Schluter
Hermann Ney
31
0
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
59
14
0
01 Oct 2024
SSR: Alignment-Aware Modality Connector for Speech Language Models
SSR: Alignment-Aware Modality Connector for Speech Language Models
Weiting Tan
Hirofumi Inaguma
Ning Dong
Paden Tomasello
Xutai Ma
27
3
0
30 Sep 2024
Boosting Hybrid Autoregressive Transducer-based ASR with Internal
  Acoustic Model Training and Dual Blank Thresholding
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Takafumi Moriya
Takanori Ashihara
Masato Mimura
Hiroshi Sato
Kohei Matsuura
Ryo Masumura
Taichi Asami
19
0
0
30 Sep 2024
Alignment-Free Training for Transducer-based Multi-Talker ASR
Alignment-Free Training for Transducer-based Multi-Talker ASR
Takafumi Moriya
Shota Horiguchi
Marc Delcroix
Ryo Masumura
Takanori Ashihara
Hiroshi Sato
Kohei Matsuura
Masato Mimura
31
1
0
30 Sep 2024
Mamba for Streaming ASR Combined with Unimodal Aggregation
Mamba for Streaming ASR Combined with Unimodal Aggregation
Ying Fang
Xiaofei Li
Mamba
16
1
0
30 Sep 2024
Predictive Speech Recognition and End-of-Utterance Detection Towards
  Spoken Dialog Systems
Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems
Oswald Zink
Yosuke Higuchi
Carlos Mullov
Alexander Waibel
Tetsunori Kobayashi
24
0
0
30 Sep 2024
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
Bingshen Mu
Kun Wei
Qijie Shao
Yong Xu
Lei Xie
MoE
37
1
0
30 Sep 2024
Efficient Long-Form Speech Recognition for General Speech In-Context
  Learning
Efficient Long-Form Speech Recognition for General Speech In-Context Learning
Hao Yen
Shaoshi Ling
Guoli Ye
21
0
0
29 Sep 2024
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal
  Chain-of-Thought
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought
Yexing Du
Ziyang Ma
Yifan Yang
Keqi Deng
Xie Chen
Bo Yang
Yang Xiang
Ming Liu
Bing Qin
LRM
26
6
0
29 Sep 2024
Speech-Mamba: Long-Context Speech Recognition with Selective State
  Spaces Models
Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
Xiaoxue Gao
Nancy F. Chen
Mamba
35
1
0
27 Sep 2024
MC-SEMamba: A Simple Multi-channel Extension of SEMamba
MC-SEMamba: A Simple Multi-channel Extension of SEMamba
Wen-Yuan Ting
Wenze Ren
Rong-Yu Chao
Hsin-Yi Lin
Yu Tsao
Fan-Gang Zeng
Mamba
35
0
0
26 Sep 2024
Paraformer-v2: An improved non-autoregressive transformer for
  noise-robust speech recognition
Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
Keyu An
Zerui Li
Zhifu Gao
Shiliang Zhang
27
0
0
26 Sep 2024
Deep CLAS: Deep Contextual Listen, Attend and Spell
Deep CLAS: Deep Contextual Listen, Attend and Spell
Shifu Xiong
Mengzhi Wang
Genshun Wan
Hang Chen
Jianqing Gao
Lirong Dai
21
0
0
26 Sep 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice
  Control
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Ryuichi Yamamoto
Yuma Shirahata
Masaya Kawamura
Kentaro Tachibana
DiffM
32
2
0
26 Sep 2024
How to Connect Speech Foundation Models and Large Language Models? What
  Matters and What Does Not
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
Francesco Verdini
Pierfrancesco Melucci
Stefano Perna
Francesco Cariaggi
Marco Gaido
...
Marek Kasztelnik
L. Bentivogli
Sébastien Bratières
P. Merialdo
Simone Scardapane
AuLLM
20
0
0
25 Sep 2024
Revisiting Acoustic Features for Robust ASR
Revisiting Acoustic Features for Robust ASR
Muhammad Ahmed Shah
Bhiksha Raj
AAML
16
0
0
24 Sep 2024
Boosting Code-Switching ASR with Mixture of Experts Enhanced
  Speech-Conditioned LLM
Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM
Fengrun Zhang
Wang Geng
Hukai Huang
Cheng Yi
He Qu
He Qu
AuLLM
MoE
28
1
0
24 Sep 2024
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple
  Speakers
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
Nohil Park
Heeseung Kim
Che Hyun Lee
Jooyoung Choi
Jiheum Yeom
Sungroh Yoon
23
2
0
24 Sep 2024
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient
  Speaker-Adaptive Text-to-Speech via Autoguidance
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance
Jiheum Yeom
Heeseung Kim
Jooyoung Choi
Che Hyun Lee
Nohil Park
Sungroh Yoon
24
1
0
24 Sep 2024
Room Impulse Responses help attackers to evade Deep Fake Detection
Room Impulse Responses help attackers to evade Deep Fake Detection
Hieu-Thi Luong
Duc-Tuan Truong
Kong Aik Lee
Eng Siong Chng
43
1
0
23 Sep 2024
FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation,
  Casing, and Context
FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
Anna Povey
Katherine Povey
20
0
0
23 Sep 2024
Training Large ASR Encoders with Differential Privacy
Training Large ASR Encoders with Differential Privacy
Geeticka Chauhan
Steve Chien
Om Thakkar
Abhradeep Thakurta
Arun Narayanan
25
1
0
21 Sep 2024
Target word activity detector: An approach to obtain ASR word boundaries
  without lexicon
Target word activity detector: An approach to obtain ASR word boundaries without lexicon
S. Sivasankaran
Eric Sun
Jinyu Li
Yan-ping Huang
Jing Pan
30
0
0
20 Sep 2024
MuCodec: Ultra Low-Bitrate Music Codec
MuCodec: Ultra Low-Bitrate Music Codec
Yaoxun Xu
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Shun Lei
Zhiwei Lin
Zhiyong Wu
30
1
0
20 Sep 2024
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Kohei Saijo
Janek Ebbers
François G. Germain
Sameer Khurana
G. Wichern
Jonathan Le Roux
37
1
0
20 Sep 2024
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
AutoMode-ASR: Learning to Select ASR Systems for Better Quality and Cost
Ahmet Gündüz
Yunsu Kim
Kamer Ali Yuksel
Mohamed Al-Badrashiny
Thiago Castro Ferreira
Hassan Sawaf
33
0
0
19 Sep 2024
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
Chien-Chun Wang
Li-Wei Chen
Cheng-Kang Chou
Hung-Shin Lee
Berlin Chen
Hsin-Min Wang
20
0
0
19 Sep 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Sijing Chen
Yuan Feng
Laipeng He
Tianwei He
Wendi He
...
Huimin Zhang
Xiang Zhang
Guangcheng Zhao
Hongbin Zhou
Pengpeng Zou
30
4
0
18 Sep 2024
Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight
  Speech Enhancement
Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement
Zizhen Lin
Yuanle Li
Junyu Wang
Ruili Li
34
0
0
18 Sep 2024
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Yufeng Yang
Desh Raj
Ju Lin
Niko Moritz
J. Jia
...
Egor Lakomkin
Yiteng Huang
Jacob Donley
Jay Mahadeokar
Ozlem Kalinli
19
2
0
17 Sep 2024
A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling
  Framework
A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework
Zheng Nan
T. Dang
V. Sethu
Beena Ahmed
16
0
0
17 Sep 2024
An Efficient Self-Learning Framework For Interactive Spoken Dialog
  Systems
An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems
Hitesh Tulsiani
David M. Chan
Shalini Ghosh
Garima Lalwani
Prabhat Pandey
Ankish Bansal
Sri Garimella
Ariya Rastrow
Björn Hoffmeister
26
0
0
16 Sep 2024
2D or not 2D: How Does the Dimensionality of Gesture Representation
  Affect 3D Co-Speech Gesture Generation?
2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation?
Teo Guichoux
Laure Soulier
Nicolas Obin
Catherine Pelachaud
SLR
32
0
0
16 Sep 2024
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for
  SLT 2024 LRDWWS Challenge
Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge
Shuiyun Liu
Yuxiang Kong
Pengcheng Guo
Weiji Zhuang
Peng Gao
Yujun Wang
Lei Xie
36
0
0
16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
4
0
16 Sep 2024
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Xuanru Zhou
Cheol Jun Cho
Ayati Sharma
Brittany Morin
D. Baquirin
...
Zachary Miller
B. Tee
M. G. Tempini
Jiachen Lian
Gopala Anumanchipalli
27
3
0
15 Sep 2024
ASR Error Correction using Large Language Models
ASR Error Correction using Large Language Models
Rao Ma
Mengjie Qian
Mark J. F. Gales
Kate Knill
KELM
46
1
0
14 Sep 2024
Leveraging Self-Supervised Learning for Speaker Diarization
Leveraging Self-Supervised Learning for Speaker Diarization
Jiangyu Han
Federico Landini
Johan Rohdin
Anna Silnova
Mireia Díez
Lukas Burget
33
1
0
14 Sep 2024
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
Xinfeng Li
Kai Li
Yifan Zheng
Chen Yan
Xiaoyu Ji
Wenyuan Xu
23
13
0
14 Sep 2024
Clean Label Attacks against SLU Systems
Clean Label Attacks against SLU Systems
Henry Li Xinyuan
Sonal Joshi
Thomas Thebaud
Jesus Villalba
Najim Dehak
Sanjeev Khudanpur
AAML
32
0
0
13 Sep 2024
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource
  Languages
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
Yao-Fei Cheng
Li-Wei Chen
Hung-Shin Lee
Hsin-Min Wang
16
0
0
13 Sep 2024
Exploring SSL Discrete Tokens for Multilingual ASR
Exploring SSL Discrete Tokens for Multilingual ASR
Mingyu Cui
Daxin Tan
Yifan Yang
Dingdong Wang
Huimeng Wang
Xiao Chen
Xie Chen
Xunying Liu
28
1
0
13 Sep 2024
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Minglun Han
Ye Bai
Chen Shen
Youjia Huang
Mingkun Huang
Zehua Lin
Linhao Dong
Lu Lu
Yuxuan Wang
35
1
0
13 Sep 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for
  Prosody Modeling
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
31
0
0
13 Sep 2024
Contextualization of ASR with LLM using phonetic retrieval-based
  augmentation
Contextualization of ASR with LLM using phonetic retrieval-based augmentation
Zhihong Lei
Xingyu Na
Mingbin Xu
Ernest Pusateri
Christophe Van Gysel
Yuanyuan Zhang
Shiyi Han
Zhen Huang
28
2
0
11 Sep 2024
Previous
12345...333435
Next