ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.00015
  4. Cited By
ESPnet: End-to-End Speech Processing Toolkit

ESPnet: End-to-End Speech Processing Toolkit

30 March 2018
Shinji Watanabe
Takaaki Hori
Shigeki Karita
Tomoki Hayashi
Jiro Nishitoba
Y. Unno
Nelson Yalta
Jahn Heymann
Matthew Wiesner
Nanxin Chen
Adithya Renduchintala
Tsubasa Ochiai
    VLM
ArXivPDFHTML

Papers citing "ESPnet: End-to-End Speech Processing Toolkit"

50 / 258 papers shown
Title
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion
  and Householder Transformation
Parameter-efficient Dysarthric Speech Recognition Using Adapter Fusion and Householder Transformation
Jinzi Qi
Hugo Van hamme
38
3
0
12 Jun 2023
Latent Phrase Matching for Dysarthric Speech
Latent Phrase Matching for Dysarthric Speech
Colin S. Lea
Dianna Yee
Jaya Narain
Zifang Huang
Lauren Tooley
Jeffrey P. Bigham
Leah Findlater
21
4
0
08 Jun 2023
Transfer Learning from Pre-trained Language Models Improves End-to-End
  Speech Summarization
Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Tomohiro Tanaka
Takatomo Kano
A. Ogawa
Marc Delcroix
29
9
0
07 Jun 2023
Text-to-Speech Pipeline for Swiss German -- A comparison
Text-to-Speech Pipeline for Swiss German -- A comparison
Tobias Bollinger
Jan Deriu
Manfred Vogel
DiffM
21
0
0
31 May 2023
ASR and Emotional Speech: A Word-Level Investigation of the Mutual
  Impact of Speech and Emotion Recognition
ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition
Yuanchao Li
Zeyu Zhao
Ondˇrej Klejch
P. Bell
Catherine Lai
11
14
0
25 May 2023
Incorporating Ultrasound Tongue Images for Audio-Visual Speech
  Enhancement through Knowledge Distillation
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Ruixin Zheng
Yang Ai
Zhenhua Ling
24
8
0
24 May 2023
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Yuhao Liang
Fan Yu
Yangze Li
Pengcheng Guo
Shiliang Zhang
Qian Chen
Linfu Xie
30
8
0
23 May 2023
CASA-ASR: Context-Aware Speaker-Attributed ASR
CASA-ASR: Context-Aware Speaker-Attributed ASR
Mohan Shi
Zhihao Du
Qian Chen
Fan Yu
Yangze Li
Shiliang Zhang
Jie Zhang
Lirong Dai
34
8
0
21 May 2023
A New Benchmark of Aphasia Speech Recognition and Detection Based on
  E-Branchformer and Multi-task Learning
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Jiyang Tang
William Chen
Xuankai Chang
Shinji Watanabe
B. MacWhinney
21
10
0
19 May 2023
Language-universal phonetic encoder for low-resource speech recognition
Language-universal phonetic encoder for low-resource speech recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
33
2
0
19 May 2023
Language-Universal Phonetic Representation in Multilingual Speech
  Pretraining for Low-Resource Speech Recognition
Language-Universal Phonetic Representation in Multilingual Speech Pretraining for Low-Resource Speech Recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
35
5
0
19 May 2023
JetSeg: Efficient Real-Time Semantic Segmentation Model for Low-Power
  GPU-Embedded Systems
JetSeg: Efficient Real-Time Semantic Segmentation Model for Low-Power GPU-Embedded Systems
Miguel Lopez-Montiel
Daniel Alejandro Lopez
Oscar Montiel
VLM
SSeg
28
0
0
19 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech
  Recognition, Translation, and Understanding Tasks
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
21
17
0
18 May 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Zhifu Gao
Zerui Li
Jiaming Wang
Haoneng Luo
Xian Shi
...
Yabin Li
Lingyun Zuo
Zhihao Du
Zhangyu Xiao
Shiliang Zhang
29
54
0
18 May 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
Jiatong Shi
Dan Berrebbi
William Chen
Ho-Lam Chung
En-Pei Hu
...
Xuankai Chang
Shang-Wen Li
Abdel-rahman Mohamed
Hung-yi Lee
Shinji Watanabe
ELM
55
58
0
18 May 2023
Quran Recitation Recognition using End-to-End Deep Learning
Quran Recitation Recognition using End-to-End Deep Learning
Ahmad Al Harere
Khloud Al Jallad
30
6
0
10 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech
  Representation Models
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
25
3
0
09 May 2023
Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation
Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation
Deyi Ji
Haoran Wang
Mingyuan Tao
Jianqiang Huang
Xiansheng Hua
Hongtao Lu
35
61
0
06 May 2023
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming
  Conformer ASR
Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR
Xilai Li
Goeric Huybrechts
S. Ronanki
Jeffrey J. Farris
S. Bodapati
33
6
0
18 Apr 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
27
17
0
13 Apr 2023
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for
  Mandarin Speech Recognition
Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition
Kai Liu
Hailiang Xiong
Gangqiang Yang
Zhengfeng Du
Yewen Cao
D. Shah
13
0
0
23 Mar 2023
I3D: Transformer architectures with input-dependent dynamic depth for
  speech recognition
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Yifan Peng
Jaesong Lee
Shinji Watanabe
27
19
0
14 Mar 2023
Stabilising and accelerating light gated recurrent units for automatic
  speech recognition
Stabilising and accelerating light gated recurrent units for automatic speech recognition
Adel Moumen
Titouan Parcollet
26
3
0
16 Feb 2023
Cut your Losses with Squentropy
Cut your Losses with Squentropy
Like Hui
M. Belkin
S. Wright
UQCV
18
8
0
08 Feb 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
  Unsupervised Text Pretraining
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
32
17
0
30 Jan 2023
Achieving Timestamp Prediction While Recognizing with Non-Autoregressive
  End-to-End ASR Model
Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model
Xian Shi
Yanni Chen
Shiliang Zhang
Zhijie Yan
13
8
0
29 Jan 2023
Warning: Humans Cannot Reliably Detect Speech Deepfakes
Warning: Humans Cannot Reliably Detect Speech Deepfakes
Kimberly T. Mai
Sergi D. Bray
Toby O. Davies
Lewis D. Griffin
39
40
0
19 Jan 2023
Memory Augmented Lookup Dictionary based Language Modeling for Automatic
  Speech Recognition
Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition
Yukun Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
RALM
32
0
0
30 Dec 2022
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from
  Style-Based TTS Models
StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models
Yinghao Aaron Li
Cong Han
N. Mesgarani
19
18
0
29 Dec 2022
Investigation of Japanese PnG BERT language model in text-to-speech
  synthesis for pitch accent language
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Yusuke Yasuda
T. Toda
25
8
0
16 Dec 2022
End-to-End Speech Translation of Arabic to English Broadcast News
End-to-End Speech Translation of Arabic to English Broadcast News
Fethi Bougares
Salim Jouili
24
0
0
11 Dec 2022
SoftCorrect: Error Correction with Soft Detection for Automatic Speech
  Recognition
SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition
Yichong Leng
Xu Tan
Wenjie Liu
Kaitao Song
Rui Wang
Xiang-Yang Li
Tao Qin
Ed Lin
Tie-Yan Liu
21
15
0
02 Dec 2022
EURO: ESPnet Unsupervised ASR Open-source Toolkit
EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao
Jiatong Shi
Shun-Po Chuang
Leibny Paola García-Perera
Hung-yi Lee
Shinji Watanabe
Sanjeev Khudanpur
21
8
0
30 Nov 2022
Neural Transducer Training: Reduced Memory Consumption with Sample-wise
  Computation
Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
Stefan Braun
Erik McDermott
Roger Hsiao
37
1
0
29 Nov 2022
Multitask Learning for Low Resource Spoken Language Understanding
Multitask Learning for Low Resource Spoken Language Understanding
Quentin Meeus
Marie-Francine Moens
Hugo Van hamme
14
4
0
24 Nov 2022
Self-Remixing: Unsupervised Speech Separation via Separation and
  Remixing
Self-Remixing: Unsupervised Speech Separation via Separation and Remixing
Kohei Saijo
Tetsuji Ogawa
SSL
22
11
0
18 Nov 2022
Streaming Joint Speech Recognition and Disfluency Detection
Streaming Joint Speech Recognition and Disfluency Detection
Hayato Futami
E. Tsunoo
Kentarou Shibata
Yosuke Kashiwagi
Takao Okuda
Siddhant Arora
Shinji Watanabe
34
6
0
16 Nov 2022
Align, Write, Re-order: Explainable End-to-End Speech Translation via
  Operation Sequence Generation
Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation
Motoi Omachi
Brian Yan
Siddharth Dalmia
Yuya Fujita
Shinji Watanabe
LRM
25
3
0
11 Nov 2022
Minimum Latency Training of Sequence Transducers for Streaming
  End-to-End Speech Recognition
Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition
Yusuke Shinohara
Shinji Watanabe
AI4TS
21
9
0
04 Nov 2022
Probing Statistical Representations For End-To-End ASR
Probing Statistical Representations For End-To-End ASR
A. Ollerenshaw
Md. Asif Jalal
Thomas Hain
27
2
0
03 Nov 2022
Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model
  for Telephonic-Speech ASR
Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR
Vrunda N. Sukhadia
Anjana Arunkumar
S. Umesh
20
1
0
03 Nov 2022
Phonetic-assisted Multi-Target Units Modeling for Improving
  Conformer-Transducer ASR system
Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Li Li
Dongxing Xu
Haoran Wei
Yanhua Long
21
2
0
03 Nov 2022
Towards Zero-Shot Code-Switched Speech Recognition
Towards Zero-Shot Code-Switched Speech Recognition
Brian Yan
Matthew Wiesner
Ondˇrej Klejch
P. Jyothi
Shinji Watanabe
26
19
0
02 Nov 2022
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Dan Berrebbi
Brian Yan
Shinji Watanabe
LRM
20
4
0
01 Nov 2022
Fast and parallel decoding for transducer
Fast and parallel decoding for transducer
Wei Kang
Liyong Guo
Fangjun Kuang
Long Lin
Mingshuang Luo
Zengwei Yao
Xiaoyu Yang
Piotr Żelasko
Daniel Povey
AI4TS
19
15
0
31 Oct 2022
Structured State Space Decoder for Speech Recognition and Synthesis
Structured State Space Decoder for Speech Recognition and Synthesis
Koichi Miyazaki
Masato Murata
Tomoki Koriyama
34
12
0
31 Oct 2022
Wespeaker: A Research and Production oriented Speaker Embedding Learning
  Toolkit
Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit
Hongji Wang
Che-Yuan Liang
Shuai Wang
Zhengyang Chen
Binbin Zhang
Xu Xiang
Yan Deng
Y. Qian
21
116
0
31 Oct 2022
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic
  Forgetting in Automatic Speech Recognition
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition
Steven Vander Eeckt
Hugo Van hamme
CLL
MoMe
58
14
0
27 Oct 2022
Training Autoregressive Speech Recognition Models with Limited in-domain
  Supervision
Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
14
0
0
27 Oct 2022
Reducing Language confusion for Code-switching Speech Recognition with
  Token-level Language Diarization
Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization
Hexin Liu
Haihua Xu
Leibny Paola García
Andy W. H. Khong
Yi He
Sanjeev Khudanpur
19
24
0
26 Oct 2022
Previous
123456
Next