Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.00015
Cited By
ESPnet: End-to-End Speech Processing Toolkit
30 March 2018
Shinji Watanabe
Takaaki Hori
Shigeki Karita
Tomoki Hayashi
Jiro Nishitoba
Y. Unno
Nelson Yalta
Jahn Heymann
Matthew Wiesner
Nanxin Chen
Adithya Renduchintala
Tsubasa Ochiai
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ESPnet: End-to-End Speech Processing Toolkit"
50 / 258 papers shown
Title
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge
Yuhao Liang
Pei-Ning Chen
F. Yu
Xinfa Zhu
Tianyi Xu
Linfu Xie
28
0
0
26 Oct 2022
Taxonomic Classification of IoT Smart Home Voice Control
M. Hewitt
H. Cunningham
21
1
0
24 Oct 2022
Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses
C. Li
Ngoc Thang Vu
14
2
0
20 Oct 2022
Robust One-Shot Singing Voice Conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DiffM
19
8
0
20 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
17
19
0
19 Oct 2022
Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion
D. Ma
Lester Phillip Violeta
Kazuhiro Kobayashi
T. Toda
21
6
0
19 Oct 2022
Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting
Saket Dingliwal
Monica Sunkara
S. Bodapati
S. Ronanki
Jeffrey J. Farris
Katrin Kirchhoff
25
0
0
18 Oct 2022
A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR
Rui Li
Guodong Ma
Dexin Zhao
Ranran Zeng
Xiaoyu Li
Haolin Huang
21
2
0
16 Oct 2022
An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition
Chao-Han Huck Yang
Jun Qi
Sabato Marco Siniscalchi
Chin-Hui Lee
26
4
0
12 Oct 2022
A context-aware knowledge transferring strategy for CTC-based ASR
Keda Lu
Kuan-Yu Chen
15
14
0
12 Oct 2022
CTC Alignments Improve Autoregressive Translation
Brian Yan
Siddharth Dalmia
Yosuke Higuchi
Graham Neubig
Florian Metze
A. Black
Shinji Watanabe
44
33
0
11 Oct 2022
Blind Signal Dereverberation for Machine Speech Recognition
Samik Sadhu
H. Hermansky
11
0
0
30 Sep 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Kwangyoun Kim
Felix Wu
Yifan Peng
Jing Pan
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
55
105
0
30 Sep 2022
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
Ruibin Yuan
Yuxuan Wu
Jacob Li
Jaxter Kim
24
5
0
09 Sep 2022
Streaming Target-Speaker ASR with Neural Transducer
Takafumi Moriya
Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
T. Shinozaki
26
21
0
09 Sep 2022
Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation
Zoey Liu
J. Spence
Emily Tucker Prudhommeaux
24
8
0
26 Aug 2022
Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
A. Andrusenko
R. Nasretdinov
A. Romanenko
20
18
0
16 Aug 2022
Improving Mandarin Speech Recogntion with Block-augmented Transformer
Xiaoming Ren
Huifeng Zhu
Liuwei Wei
Minghui Wu
Jie Hao
33
9
0
24 Jul 2022
End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
Thierry Desot
François Portet
Michel Vacher
27
12
0
17 Jul 2022
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition
Xun Gong
Zhikai Zhou
Y. Qian
20
3
0
15 Jul 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
Yifan Peng
Siddharth Dalmia
Ian Lane
Shinji Watanabe
21
143
0
06 Jul 2022
Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Kun Wei
Pengcheng Guo
Ning Jiang
48
11
0
02 Jul 2022
Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition
Guangzhi Sun
C. Zhang
P. Woodland
19
12
0
02 Jul 2022
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition
Szu-Jui Chen
Jiamin Xie
John H. L. Hansen
35
8
0
30 Jun 2022
TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline
Chengfei Li
Shuhao Deng
Yaoping Wang
Guangjing Wang
Y. Gong
Changbin Chen
Jinfeng Bai
25
16
0
27 Jun 2022
Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion
Tuan Vu Ho
M. Kobayashi
M. Akagi
16
1
0
27 Jun 2022
Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models
Han Ji
T. Patel
O. Scharenborg
34
7
0
24 Jun 2022
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Arunkumar A
S. Umesh
SSL
34
8
0
09 Jun 2022
LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia
Dmytro Okhonko
M. Lewis
Sergey Edunov
Shinji Watanabe
Florian Metze
Luke Zettlemoyer
Abdel-rahman Mohamed
AuLLM
MoE
26
14
0
07 Jun 2022
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Jinchuan Tian
Jianwei Yu
Chunlei Zhang
Chao Weng
Yuexian Zou
Dong Yu
AuLLM
17
25
0
05 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
33
38
0
30 May 2022
Improving CTC-based ASR Models with Gated Interlayer Collaboration
Yuting Yang
Yuke Li
Binbin Du
28
11
0
25 May 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
29
24
0
20 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Sameer Khurana
Antoine Laurent
James R. Glass
25
36
0
17 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
45
37
0
02 May 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
19
19
0
27 Apr 2022
Fusion of Self-supervised Learned Models for MOS Prediction
Zhengdong Yang
Wangjin Zhou
Chenhui Chu
Sheng Li
Raj Dabre
Raphaël Rubino
Yi Zhao
20
28
0
11 Apr 2022
Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music
Xiaoxue Gao
Chitralekha Gupta
Haizhou Li
24
21
0
07 Apr 2022
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
Zhao You
Shulin Feng
Dan Su
Dong Yu
19
9
0
07 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
10
26
0
05 Apr 2022
An Initialization Scheme for Meeting Separation with Spatial Mixture Models
Christoph Boeddeker
Tobias Cord-Landwehr
Thilo von Neumann
Reinhold Haeb-Umbach
24
10
0
04 Apr 2022
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition
Guodong Ma
Pengfei Hu
Jian Kang
Shen Huang
Hao-Ming Huang
13
9
0
02 Apr 2022
Speaker adaptation for Wav2vec2 based dysarthric ASR
M. Baskar
Tim Herzig
Diana Nguyen
Mireia Díez
Tim Polzehl
L. Burget
J. Černocký
28
28
0
02 Apr 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation
Xuankai Chang
Takashi Maekaku
Yuya Fujita
Shinji Watanabe
VLM
49
45
0
01 Apr 2022
End-to-End Multi-speaker ASR with Independent Vector Analysis
Robin Scheibler
Wangyou Zhang
Xuankai Chang
Shinji Watanabe
Y. Qian
18
2
0
01 Apr 2022
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
Ryo Fukuda
Katsuhito Sudoh
Satoshi Nakamura
10
7
0
29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
23
93
0
29 Mar 2022
Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech
Samik Sadhu
H. Hermansky
16
4
0
24 Mar 2022
Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For Disordered Speech Recognition
Shujie Hu
Shansong Liu
Xurong Xie
Mengzhe Geng
Tianzi Wang
Shoukang Hu
Mingyu Cui
Xunying Liu
Helen Meng
9
14
0
19 Mar 2022
Transformer-based Streaming ASR with Cumulative Attention
Mohan Li
Shucong Zhang
Catalin Zorila
R. Doddipatla
19
9
0
11 Mar 2022
Previous
1
2
3
4
5
6
Next