ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08100
  4. Cited By
Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer: Convolution-augmented Transformer for Speech Recognition

16 May 2020
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
Jiahui Yu
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
ArXivPDFHTML

Papers citing "Conformer: Convolution-augmented Transformer for Speech Recognition"

50 / 1,744 papers shown
Title
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot
  TTS and LLM
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
Jiawei Yu
Y. Li
Xiaosong Qiao
Huan Zhao
Xiaofeng Zhao
Wei Tang
M. Zhang
Hao Yang
Jinsong Su
75
0
0
20 Nov 2024
Signformer is all you need: Towards Edge AI for Sign Language
Signformer is all you need: Towards Edge AI for Sign Language
Eta Yang
SLR
82
0
0
19 Nov 2024
SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features
Yu-Fei Shi
Yang Ai
Ye-Xin Lu
Hui-Peng Du
Zhen-Hua Ling
31
0
0
18 Nov 2024
Pitch-and-Spectrum-Aware Singing Quality Assessment with Bias Correction and Model Fusion
Yu-Fei Shi
Yang Ai
Ye-Xin Lu
Hui-Peng Du
Zhen-Hua Ling
28
0
0
17 Nov 2024
XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection
XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection
Yang Xiao
Rohan Kumar Das
Mamba
31
1
0
15 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition
  Learning and Synthesizer Feature Augmentation
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation
Kuiyuan Zhang
Zhongyun Hua
Yushu Zhang
Yifang Guo
Tao Xiang
29
0
0
14 Nov 2024
Multimodal Fusion Balancing Through Game-Theoretic Regularization
Multimodal Fusion Balancing Through Game-Theoretic Regularization
Konstantinos Kontras
Thomas Strypsteen
Christos Chatzichristos
Paul P. Liang
Matthew Blaschko
M. D. Vos
31
0
0
11 Nov 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for
  Speech Recognition
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Yoshiki Masuyama
Koichi Miyazaki
Masato Murata
Mamba
37
0
0
11 Nov 2024
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution
  and Harmonic Prior for Reliable Complex Spectrogram Estimation
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
Reo Yoneyama
Atsushi Miyashita
Ryuichi Yamamoto
T. Toda
27
1
0
11 Nov 2024
Gen-AI for User Safety: A Survey
Gen-AI for User Safety: A Survey
Akshar Prabhu Desai
Tejasvi Ravi
Mohammad Luqman
Mohit Sharma
Nithya Kota
Pranjul Yadav
33
1
0
10 Nov 2024
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets
  for Sound Event Localization and Detection
PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Fang Kang
Feiran Yang
Wenwu Wang
Mark D. Plumbley
J. Yang
31
0
0
10 Nov 2024
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Shashi Kumar
Iuliia Thorbecke
Sergio Burdisso
Esaú Villatoro-Tello
M. Errecalde
Kadri Hacioğlu
Pradeep Rangappa
P. Motlícek
A. Ganapathiraju
Andreas Stolcke
51
1
0
06 Nov 2024
LASER: Attention with Exponential Transformation
LASER: Attention with Exponential Transformation
Sai Surya Duvvuri
Inderjit Dhillon
24
1
0
05 Nov 2024
DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool
  for Performance Profiling and Analysis of Deep Learning Workloads
DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
Qidong Zhao
Hao Wu
Yuming Hao
Zilingfeng Ye
Jiajia Li
Xu Liu
Keren Zhou
24
0
0
05 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
49
1
0
03 Nov 2024
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation
Langlin Huang
Mengyu Bu
Yang Feng
21
0
0
03 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
26
0
0
31 Oct 2024
Speech is More Than Words: Do Speech-to-Text Translation Systems
  Leverage Prosody?
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
Ioannis Tsiamas
Matthias Sperber
Andrew Finch
Sarthak Garg
31
11
0
31 Oct 2024
Augmenting Polish Automatic Speech Recognition System With Synthetic
  Data
Augmenting Polish Automatic Speech Recognition System With Synthetic Data
Łukasz Bondaruk
Jakub Kubiak
Mateusz Czyżnikiewicz
29
0
0
30 Oct 2024
Leveraging Reverberation and Visual Depth Cues for Sound Event
  Localization and Detection with Distance Estimation
Leveraging Reverberation and Visual Depth Cues for Sound Event Localization and Detection with Distance Estimation
Davide Berghi
Philip J. B. Jackson
29
1
0
29 Oct 2024
Representational learning for an anomalous sound detection system with
  source separation model
Representational learning for an anomalous sound detection system with source separation model
S. Shin
Seokjin Lee
20
0
0
29 Oct 2024
Device-Directed Speech Detection for Follow-up Conversations Using Large
  Language Models
Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models
Ognjen
Rudovic
Pranay Dighe
Yi Su
Vineet Garg
Sameer Dharur
Xiaochuan Niu
Ahmed H. Abdelaziz
Saurabh N. Adya
Ahmed H. Tewfik
29
0
0
28 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
54
2
0
23 Oct 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Guanrou Yang
Fan Yu
Z. Ma
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
27
1
0
22 Oct 2024
GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot
  Keyword Spotting
GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting
Pai Zhu
Jacob Bartel
Dhruuv Agarwal
Kurt Partridge
Hyun-jin Park
Quan Wang
15
0
0
22 Oct 2024
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech
  Recognition
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition
Z. Liu
Xiaolou Li
Chen Chen
Li Guo
Lantian Li
D. Wang
20
0
0
21 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
31
1
0
21 Oct 2024
Optimizing Neural Speech Codec for Low-Bitrate Compression via
  Multi-Scale Encoding
Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
Peiji Yang
Fengping Wang
Yicheng Zhong
Huawei Wei
Zhisheng Wang
23
0
0
21 Oct 2024
Generalized Probabilistic Attention Mechanism in Transformers
Generalized Probabilistic Attention Mechanism in Transformers
DongNyeong Heo
Heeyoul Choi
49
0
0
21 Oct 2024
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP
  and Query-by-Example
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
Suhita Ghosh
Melanie Jouaiti
Arnab Das
Yamini Sinha
Tim Polzehl
Ingo Siegert
Sebastian Stober
23
2
0
20 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech
  Recognition using Agnostic Contrastive Mixup
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
Carlos Carvalho
A. Abad
16
0
0
18 Oct 2024
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical
  and Landmark Loss Optimization
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization
Bin Lin
Yanzhen Yu
Jianhao Ye
Ruitao Lv
Y. Yang
Ruoye Xie
Pan Yu
Hongbin Zhou
VGen
30
1
0
18 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
43
2
0
16 Oct 2024
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
Mao-Kui He
Jun Du
Shu-Tong Niu
Qing-Feng Liu
Chin-Hui Lee
24
0
0
15 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech
  Processing
Investigation of Speaker Representation for Target-Speaker Speech Processing
Takanori Ashihara
Takafumi Moriya
Shota Horiguchi
Junyi Peng
Tsubasa Ochiai
Marc Delcroix
Kohei Matsuura
Hiroshi Sato
26
1
0
15 Oct 2024
Character-aware audio-visual subtitling in context
Character-aware audio-visual subtitling in context
Jaesung Huh
Andrew Zisserman
36
0
0
14 Oct 2024
In-Materia Speech Recognition
In-Materia Speech Recognition
Mohamadreza Zolfagharinejad
Julian Büchel
Lorenzo Cassola
Sachin Kinge
Ghazi Sarwat Syed
A. Sebastian
Wilfred G. van der Wiel
17
0
0
14 Oct 2024
Full-Rank No More: Low-Rank Weight Training for Modern Speech
  Recognition Models
Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
Adriana Fernandez-Lopez
Shiwei Liu
L. Yin
Stavros Petridis
Maja Pantic
24
0
0
10 Oct 2024
Transducer Consistency Regularization for Speech to Text Applications
Transducer Consistency Regularization for Speech to Text Applications
Cindy Tseng
Yun Tang
Vijendra Raj Apsingekar
30
0
0
09 Oct 2024
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online
  Attractor Extraction
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Di Liang
Xiaofei Li
24
0
0
09 Oct 2024
Enforcing Interpretability in Time Series Transformers: A Concept
  Bottleneck Framework
Enforcing Interpretability in Time Series Transformers: A Concept Bottleneck Framework
Angela van Sprang
Erman Acar
Willem Zuidema
AI4TS
36
1
0
08 Oct 2024
FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event
  Detection
FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event Detection
Han Jiang
Wenyu Wang
Yiquan Zhou
Hongwu Ding
Jiacheng Xu
Jihua Zhu
23
0
0
08 Oct 2024
Automatic Screening for Children with Speech Disorder using Automatic
  Speech Recognition: Opportunities and Challenges
Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges
Dancheng Liu
Jason Yang
Ishan Albrecht-Buehler
Helen Qin
Sophie Li
Yuting Hu
Amir Nassereldine
Jinjun Xiong
24
1
0
07 Oct 2024
CTC-GMM: CTC guided modality matching for fast and accurate streaming
  speech translation
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Rui Zhao
Jinyu Li
Ruchao Fan
Matt Post
36
1
0
07 Oct 2024
Improving Speaker Representations Using Contrastive Losses on
  Multi-scale Features
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
Satvik Dixit
Massa Baali
Rita Singh
Bhiksha Raj
19
0
0
07 Oct 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence
  Alignment in Neural Text-to-Speech
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
16
0
0
07 Oct 2024
Block Vecchia Approximation for Scalable and Efficient Gaussian Process
  Computations
Block Vecchia Approximation for Scalable and Efficient Gaussian Process Computations
Qilong Pan
Sameh Abdulah
M. Genton
Ying Sun
27
1
0
06 Oct 2024
Efficient and Robust Long-Form Speech Recognition with Hybrid
  H3-Conformer
Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer
Tomoki Honda
S. Sakai
Tatsuya Kawahara
21
0
0
05 Oct 2024
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech
  Language Model
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Yichen Lu
Jiaqi Song
Chao-Han Huck Yang
Shinji Watanabe
21
0
0
03 Oct 2024
NTU-NPU System for Voice Privacy 2024 Challenge
NTU-NPU System for Voice Privacy 2024 Challenge
Nikita Kuzmin
Hieu-Thi Luong
Jixun Yao
Lei Xie
Kong Aik Lee
Eng Siong Chng
49
1
0
03 Oct 2024
Previous
123456...333435
Next