ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.00015
  4. Cited By
ESPnet: End-to-End Speech Processing Toolkit

ESPnet: End-to-End Speech Processing Toolkit

30 March 2018
Shinji Watanabe
Takaaki Hori
Shigeki Karita
Tomoki Hayashi
Jiro Nishitoba
Y. Unno
Nelson Yalta
Jahn Heymann
Matthew Wiesner
Nanxin Chen
Adithya Renduchintala
Tsubasa Ochiai
    VLM
ArXivPDFHTML

Papers citing "ESPnet: End-to-End Speech Processing Toolkit"

50 / 258 papers shown
Title
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
A Conformer Based Acoustic Model for Robust Automatic Speech Recognition
Yufeng Yang
Peidong Wang
DeLiang Wang
20
12
0
01 Mar 2022
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting
  Transcription Grand Challenge
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge
Fan Yu
Shiliang Zhang
Pengcheng Guo
Yihui Fu
Zhihao Du
...
Kong Aik Lee
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
18
28
0
08 Feb 2022
The RoyalFlush System of Speech Recognition for M2MeT Challenge
The RoyalFlush System of Speech Recognition for M2MeT Challenge
Shuaishuai Ye
Peiyao Wang
Shunfei Chen
Xinhui Hu
Xinkang Xu
16
5
0
03 Feb 2022
Error Correction in ASR using Sequence-to-Sequence Models
Error Correction in ASR using Sequence-to-Sequence Models
S. Dutta
Shreyansh Jain
Ayush Maheshwari
Souvik Pal
Ganesh Ramakrishnan
P. Jyothi
KELM
31
29
0
02 Feb 2022
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
P. Mihajlik
A. Balog
T. E. Gráczi
A. Kohári
Balázs Tarján
K. Mády
17
8
0
01 Feb 2022
Discovering Phonetic Inventories with Crosslingual Automatic Speech
  Recognition
Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition
Piotr Żelasko
Siyuan Feng
Laureano Moro Velázquez
A. Abavisani
Saurabhchand Bhati
O. Scharenborg
M. Hasegawa-Johnson
Najim Dehak
33
15
0
26 Jan 2022
A Study of Transducer based End-to-End ASR with ESPnet: Architecture,
  Auxiliary Loss and Decoding Strategies
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies
Florian Boyer
Yusuke Shinohara
Takaaki Ishii
H. Inaguma
Shinji Watanabe
29
34
0
14 Jan 2022
Cross-Modal ASR Post-Processing System for Error Correction and
  Utterance Rejection
Cross-Modal ASR Post-Processing System for Error Correction and Utterance Rejection
Jing Du
Shiliang Pu
Qinbo Dong
Chao Jin
Xin Qi
Dian Gu
Ru Wu
Hongwei Zhou
30
9
0
10 Jan 2022
Textual Data Augmentation for Arabic-English Code-Switching Speech
  Recognition
Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition
A. Hussein
Shammur A. Chowdhury
Ahmed Abdelali
Najim Dehak
Ahmed M. Ali
Sanjeev Khudanpur
35
11
0
07 Jan 2022
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit
  Training for Phonetic-Reduction-Robust E2E Speech Recognition
PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition
Guodong Ma
Pengfei Hu
Nurmemet Yolwas
Shen Huang
Hao-Ming Huang
24
4
0
13 Dec 2021
Improving Code-switching Language Modeling with Artificially Generated
  Texts using Cycle-consistent Adversarial Networks
Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks
Chia-Yu Li
Ngoc Thang Vu
17
12
0
12 Dec 2021
Consistent Training and Decoding For End-to-end Speech Recognition Using
  Lattice-free MMI
Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI
Jinchuan Tian
Jianwei Yu
Chao Weng
Shi-Xiong Zhang
Dan Su
Dong Yu
Yuexian Zou
AuLLM
39
13
0
05 Dec 2021
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Siddhant Arora
Siddharth Dalmia
Pavel Denisov
Xuankai Chang
Yushi Ueda
...
Karthik Ganesan
Brian Yan
Ngoc Thang Vu
A. Black
Shinji Watanabe
VLM
25
74
0
29 Nov 2021
Effective Cross-Utterance Language Modeling for Conversational Speech
  Recognition
Effective Cross-Utterance Language Modeling for Conversational Speech Recognition
Bi-Cheng Yan
Hsin-Wei Wang
Shih-Hsuan Chiu
Hsuan-Sheng Chiu
Berlin Chen
21
1
0
05 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
129
123
0
04 Nov 2021
SA-SDR: A novel loss function for separation of meeting style data
SA-SDR: A novel loss function for separation of meeting style data
Thilo von Neumann
K. Kinoshita
Christoph Boeddeker
Marc Delcroix
Reinhold Haeb-Umbach
29
20
0
29 Oct 2021
TorchAudio: Building Blocks for Audio and Speech Processing
TorchAudio: Building Blocks for Audio and Speech Processing
Yao-Yuan Yang
Moto Hira
Zhaoheng Ni
Anjali Chourdia
Artyom Astafurov
...
Sean Narenthiran
Shinji Watanabe
Soumith Chintala
Vincent Quenneville-Bélair
Yangyang Shi
31
164
0
28 Oct 2021
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on
  Real and Simulation Conditions
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions
Wangyou Zhang
Jing Shi
Chenda Li
Shinji Watanabe
Y. Qian
19
22
0
27 Oct 2021
Lhotse: a speech data representation library for the modern deep
  learning ecosystem
Lhotse: a speech data representation library for the modern deep learning ecosystem
Willem Hagemann
Daniel Povey
Jan "Yenda" Trmal
Sanjeev Khudanpur
AuLLM
AI4TS
22
31
0
25 Oct 2021
A Unified Speaker Adaptation Approach for ASR
A Unified Speaker Adaptation Approach for ASR
Yingzhu Zhao
Chongjia Ni
C. Leung
Shafiq R. Joty
Chng Eng Siong
B. Ma
CLL
92
9
0
16 Oct 2021
Towards Identity Preserving Normal to Dysarthric Voice Conversion
Towards Identity Preserving Normal to Dysarthric Voice Conversion
Wen-Chin Huang
B. Halpern
Lester Phillip Violeta
O. Scharenborg
T. Toda
27
21
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
50
60
0
15 Oct 2021
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription
  Challenge
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
Fan Yu
Shiliang Zhang
Yihui Fu
Lei Xie
Siqi Zheng
...
Pengcheng Guo
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
8
105
0
14 Oct 2021
SDR -- Medium Rare with Fast Computations
SDR -- Medium Rare with Fast Computations
Robin Scheibler
26
17
0
13 Oct 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis
Fine-grained style control in Transformer-based Text-to-speech Synthesis
Li-Wei Chen
Alexander I. Rudnicky
80
29
0
12 Oct 2021
Speech Summarization using Restricted Self-Attention
Speech Summarization using Restricted Self-Attention
Roshan S. Sharma
Shruti Palaskar
A. Black
Florian Metze
30
33
0
12 Oct 2021
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition
Jing Pan
Tao Lei
Kwangyoun Kim
Kyu Jeong Han
Shinji Watanabe
VLM
26
9
0
11 Oct 2021
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of
  Graphemes and Syllables
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
Jounghee Kim
Pilsung Kang
VLM
21
6
0
11 Oct 2021
An Exploration of Self-Supervised Pretrained Representations for
  End-to-End Speech Recognition
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSL
AI4TS
16
81
0
09 Oct 2021
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
  Subword Units
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units
Yosuke Higuchi
Keita Karube
Tetsuji Ogawa
Tetsunori Kobayashi
16
22
0
08 Oct 2021
A Novel Blind Source Separation Framework Towards Maximum
  Signal-To-Interference Ratio
A Novel Blind Source Separation Framework Towards Maximum Signal-To-Interference Ratio
Jianjun Gu
Longbiao Cheng
Dingding Yao
Junfeng Li
Yonghong Yan
18
0
0
07 Oct 2021
SpliceOut: A Simple and Efficient Audio Augmentation Method
SpliceOut: A Simple and Efficient Audio Augmentation Method
Arjit Jain
Pranay Reddy Samala
Deepak Mittal
P. Jyothi
M. Singh
28
10
0
30 Sep 2021
Federated Learning in ASR: Not as Easy as You Think
Federated Learning in ASR: Not as Easy as You Think
Wentao Yu
J. Freiwald
Soren Tewes
F. Huennemeyer
D. Kolossa
FedML
27
17
0
30 Sep 2021
ChannelAugment: Improving generalization of multi-channel ASR by
  training with input channel randomization
ChannelAugment: Improving generalization of multi-channel ASR by training with input channel randomization
M. Gaudesi
F. Weninger
D. Sharma
P. Zhan
AAML
33
1
0
23 Sep 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context
  Prediction Network
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
26
3
0
22 Sep 2021
Investigations on Speech Recognition Systems for Low-Resource Dialectal
  Arabic-English Code-Switching Speech
Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech
Injy Hamed
Pavel Denisov
C. Li
M. Elmahdy
Slim Abdennadher
Ngoc Thang Vu
20
35
0
29 Aug 2021
Graph-PIT: Generalized permutation invariant training for continuous
  separation of arbitrary numbers of speakers
Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers
Thilo von Neumann
K. Kinoshita
Christoph Boeddeker
Marc Delcroix
Reinhold Haeb-Umbach
16
23
0
30 Jul 2021
OLR 2021 Challenge: Datasets, Rules and Baselines
OLR 2021 Challenge: Datasets, Rules and Baselines
Binling Wang
Wen-Bo Hu
Jing Li
Yiming Zhi
Zheng Li
Q. Hong
Lin Li
Dong Wang
Liming Song
Cheng Yang
18
18
0
23 Jul 2021
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
  Natural-Sounding Voice Conversion
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li
A. Zare
N. Mesgarani
19
98
0
21 Jul 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System
ESPnet-ST IWSLT 2021 Offline Speech Translation System
H. Inaguma
Shun Kiyono
Nelson Enrique Yalta Soplin
Pengcheng Guo
Jun Suzuki
Kevin Duh
Shinji Watanabe
3DV
35
2
0
01 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
54
94
0
01 Jul 2021
Multi-mode Transformer Transducer with Stochastic Future Context
Multi-mode Transformer Transducer with Stochastic Future Context
Kwangyoun Kim
Felix Wu
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
30
9
0
17 Jun 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of
  Transcribed Audio
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
42
349
0
13 Jun 2021
Should We Always Separate?: Switching Between Enhanced and Observed
  Signals for Overlapping Speech Recognition
Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition
Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
K. Kinoshita
Takafumi Moriya
Naoyuki Kamo
17
23
0
02 Jun 2021
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained
  Models into Speech Translation Encoders
Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders
Chen Xu
Bojie Hu
Yanyang Li
Yuhao Zhang
Shen Huang
Qi Ju
Tong Xiao
Jingbo Zhu
19
75
0
12 May 2021
FastCorrect: Fast Error Correction with Edit Alignment for Automatic
  Speech Recognition
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Yichong Leng
Xu Tan
Linchen Zhu
Jin Xu
Renqian Luo
Linquan Liu
Tao Qin
Xiang-Yang Li
Ed Lin
Tie-Yan Liu
KELM
24
63
0
09 May 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable
  Sequence Tasks
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks
Siddharth Dalmia
Brian Yan
Vikas Raunak
Florian Metze
Shinji Watanabe
37
30
0
02 May 2021
Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and
  Backward Transformers
Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers
Yusuke Kida
Tatsuya Komatsu
M. Togami
16
1
0
21 Apr 2021
Fusing information streams in end-to-end audio-visual speech recognition
Fusing information streams in end-to-end audio-visual speech recognition
Wentao Yu
Steffen Zeiler
D. Kolossa
81
12
0
19 Apr 2021
Advanced Long-context End-to-end Speech Recognition Using
  Context-expanded Transformers
Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Takaaki Hori
Niko Moritz
Chiori Hori
Jonathan Le Roux
22
34
0
19 Apr 2021
Previous
123456
Next