ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.00015
  4. Cited By
ESPnet: End-to-End Speech Processing Toolkit

ESPnet: End-to-End Speech Processing Toolkit

30 March 2018
Shinji Watanabe
Takaaki Hori
Shigeki Karita
Tomoki Hayashi
Jiro Nishitoba
Y. Unno
Nelson Yalta
Jahn Heymann
Matthew Wiesner
Nanxin Chen
Adithya Renduchintala
Tsubasa Ochiai
    VLM
ArXivPDFHTML

Papers citing "ESPnet: End-to-End Speech Processing Toolkit"

50 / 250 papers shown
Title
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
Jiaming Zhou
S. Zhao
Jiabei He
Hui Wang
Wenjia Zeng
Yong Chen
Haoqin Sun
Aobo Kong
Yong Qin
55
1
0
13 Mar 2025
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Peng Shen
Xugang Lu
Hisashi Kawai
RALM
60
0
0
24 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
53
0
0
17 Feb 2025
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
44
0
0
10 Jan 2025
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp
Andreas Triantafyllopoulos
M. Milling
Björn Schuller
85
0
0
16 Dec 2024
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
152
0
0
03 Oct 2024
The Conformer Encoder May Reverse the Time Dimension
The Conformer Encoder May Reverse the Time Dimension
Robin Schmitt
Albert Zeyer
Mohammad Zeineldeen
Ralf Schluter
Hermann Ney
31
0
0
01 Oct 2024
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Jiawen Kang
Lingwei Meng
Mingyu Cui
Yuejiao Wang
Xixin Wu
Xunying Liu
Helen Meng
41
2
0
19 Sep 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
65
1
0
18 Sep 2024
Ultra-Low Latency Speech Enhancement - A Comprehensive Study
Ultra-Low Latency Speech Enhancement - A Comprehensive Study
Haibin Wu
Sebastian Braun
23
0
0
16 Sep 2024
ASR Error Correction using Large Language Models
ASR Error Correction using Large Language Models
Rao Ma
Mengjie Qian
Mark J. F. Gales
Kate Knill
KELM
46
1
0
14 Sep 2024
Lightweight Transducer Based on Frame-Level Criterion
Lightweight Transducer Based on Frame-Level Criterion
Genshun Wan
Mengzhi Wang
Tingzhi Mao
Hang Chen
Z. Ye
44
1
0
05 Sep 2024
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
Bang Zeng
Ming Li
29
2
0
04 Sep 2024
Advancing Multi-talker ASR Performance with Large Language Models
Advancing Multi-talker ASR Performance with Large Language Models
Mohan Shi
Zengrui Jin
Yaoxun Xu
Yong Xu
Shi-Xiong Zhang
Kun Wei
Yiwen Shao
Chunlei Zhang
Dong Yu
29
1
0
30 Aug 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant
  Automatic Speech Recognition and Diarization
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
34
9
0
23 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
88
2
0
09 Jul 2024
PolySpeech: Exploring Unified Multitask Speech Models for
  Competitiveness with Single-task Models
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
Runyan Yang
Huibao Yang
Xiqing Zhang
Tiantian Ye
Ying Liu
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
34
0
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
45
15
0
11 Jun 2024
Crossmodal ASR Error Correction with Discrete Speech Units
Crossmodal ASR Error Correction with Discrete Speech Units
Yuanchao Li
Pinzhen Chen
Peter Bell
Catherine Lai
36
6
0
26 May 2024
Low-resource speech recognition and dialect identification of Irish in a
  multi-task framework
Low-resource speech recognition and dialect identification of Irish in a multi-task framework
Liam Lonergan
Mengjie Qian
Neasa Ní Chiaráin
Christer Gobl
A. N. Chasaide
43
2
0
02 May 2024
EfficientASR: Speech Recognition Network Compression via Attention
  Redundancy and Chunk-Level FFN Optimization
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
35
0
0
30 Apr 2024
A cost minimization approach to fix the vocabulary size in a tokenizer
  for an End-to-End ASR system
A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
Sunil Kumar Kopparapu
Ashish Panda
26
0
0
29 Apr 2024
Usefulness of Emotional Prosody in Neural Machine Translation
Usefulness of Emotional Prosody in Neural Machine Translation
Charles Brazier
Jean-Luc Rouas
23
0
0
27 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
42
4
0
10 Apr 2024
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
Yuting Yang
Andrea Merlina
Weijia Song
Tiancheng Yuan
Ken Birman
Roman Vitenberg
46
0
0
27 Feb 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech
  Recognition, Translation, and Language Identification
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
37
17
0
20 Feb 2024
Large Language Models are Efficient Learners of Noise-Robust Speech
  Recognition
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Ruizhe Li
Chao Zhang
Pin-Yu Chen
Ensiong Chng
27
20
0
19 Jan 2024
Improving ASR Contextual Biasing with Guided Attention
Improving ASR Contextual Biasing with Guided Attention
Jiyang Tang
Kwangyoun Kim
Suwon Shon
Felix Wu
Prashant Sridhar
Shinji Watanabe
21
8
0
16 Jan 2024
UCorrect: An Unsupervised Framework for Automatic Speech Recognition
  Error Correction
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
Jiaxin Guo
Minghan Wang
Xiaosong Qiao
Daimeng Wei
Hengchao Shang
...
Yinglu Li
Chang Su
Min Zhang
Shimin Tao
Hao-Yu Yang
23
6
0
11 Jan 2024
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition
A. Ogawa
Naohiro Tawara
Takatomo Kano
Marc Delcroix
43
4
0
22 Dec 2023
Attention-Guided Adaptation for Code-Switching Speech Recognition
Attention-Guided Adaptation for Code-Switching Speech Recognition
Bobbi Aditya
Mahdin Rohmatillah
Liang-Hsuan Tai
Jen-Tzung Chien
26
8
0
14 Dec 2023
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for
  Distortion-Invariant Robust Speech Recognition
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech Recognition
Dongning Yang
Wei Wang
Yanmin Qian
13
3
0
29 Nov 2023
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous
  Spanish
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish
David Gimeno-Gómez
Carlos David Martínez Hinarejos
28
0
0
21 Nov 2023
Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Retrieve and Copy: Scaling ASR Personalization to Large Catalogs
Sai Muralidhar Jayanthi
Devang Kulshreshtha
Saket Dingliwal
S. Ronanki
S. Bodapati
30
7
0
14 Nov 2023
SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab,
  IIT Madras
SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras
R. Nithya
S. Malavika
F. Jordan
Arjun Gangwar
J. MetildaN
...
Rithik Sarab
A. Dubey
G. Divakaran
K. SamudraVijaya
S. Gangashetty
9
4
0
23 Oct 2023
LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation
  Auxiliary Task for E2E Code-switching ASR
LAE-ST-MoE: Boosted Language-Aware Encoder Using Speech Translation Auxiliary Task for E2E Code-switching ASR
Guodong Ma
Wenxuan Wang
Yuke Li
Yuting Yang
Binbin Du
Haoran Fu
23
5
0
28 Sep 2023
Speech enhancement with frequency domain auto-regressive modeling
Speech enhancement with frequency domain auto-regressive modeling
Anurenjan Purushothaman
Debottam Dutta
Rohit Kumar
Sriram Ganapathy
17
2
0
24 Sep 2023
Memory-augmented conformer for improved end-to-end long-form ASR
Memory-augmented conformer for improved end-to-end long-form ASR
Carlos Carvalho
A. Abad
RALM
30
1
0
22 Sep 2023
Massive End-to-end Models for Short Search Queries
Massive End-to-end Models for Short Search Queries
Weiran Wang
Rohit Prabhavalkar
Dongseong Hwang
Qiujia Li
K. Sim
...
Zhong Meng
CJ Zheng
Yanzhang He
Tara N. Sainath
P. M. Mengibar
32
2
0
22 Sep 2023
Semi-Autoregressive Streaming ASR With Label Context
Semi-Autoregressive Streaming ASR With Label Context
Siddhant Arora
G. Saon
Shinji Watanabe
Brian Kingsbury
AI4TS
23
5
0
19 Sep 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Fangyun Wei
Yutong Chen
SLR
20
28
0
21 Aug 2023
A Systematic Exploration of Joint-training for Singing Voice Synthesis
A Systematic Exploration of Joint-training for Singing Voice Synthesis
Yuning Wu
Yifeng Yu
Jiatong Shi
Tao Qian
Qin Jin
40
5
0
05 Aug 2023
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
  Recognition
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Tian-Hao Zhang
Dinghao Zhou
Guiping Zhong
Jiaming Zhou
Baoxiang Li
20
3
0
26 Jul 2023
Multilingual Contextual Adapters To Improve Custom Word Recognition In
  Low-resource Languages
Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages
Devang Kulshreshtha
Saket Dingliwal
Brady C. Houston
S. Bodapati
12
2
0
03 Jul 2023
Multi-pass Training and Cross-information Fusion for Low-resource
  End-to-end Accented Speech Recognition
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Xuefei Wang
Yanhua Long
Yijie Li
Haoran Wei
27
4
0
20 Jun 2023
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
Cihan Xiao
Henry Li Xinyuan
Jinyi Yang
Dongji Gao
Matthew Wiesner
Kevin Duh
Sanjeev Khudanpur
29
1
0
20 Jun 2023
STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced
  Audio-Visual Diarization
STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization
Kyle Min
29
5
0
18 Jun 2023
Contrastive Learning-Based Audio to Lyrics Alignment for Multiple
  Languages
Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages
Simon Durand
Daniel Stoller
Sebastian Ewert
26
12
0
13 Jun 2023
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion
  and Householder Transformation
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation
Jinzi Qi
Hugo Van hamme
38
3
0
12 Jun 2023
12345
Next