ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11013
  4. Cited By
FunASR: A Fundamental End-to-End Speech Recognition Toolkit

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

18 May 2023
Zhifu Gao
Zerui Li
Jiaming Wang
Haoneng Luo
Xian Shi
Mengzhe Chen
Yabin Li
Lingyun Zuo
Zhihao Du
Zhangyu Xiao
Shiliang Zhang
ArXivPDFHTML

Papers citing "FunASR: A Fundamental End-to-End Speech Recognition Toolkit"

39 / 39 papers shown
Title
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang
Y. Wang
Chaoren Wang
Z. Li
Zhuo Chen
Zhizheng Wu
43
0
0
07 May 2025
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a 50KBudget50K Budget50KBudget
Xin Li
Kaikai Jia
Hao Sun
Jun Dai
Z. L. Jiang
36
0
0
27 Apr 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
39
1
0
03 Apr 2025
EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters
EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters
Xuli Shen
Hua Cai
Dingding Yu
Weilin Shen
Qing-Song Xu
Xiangyang Xue
32
0
0
25 Mar 2025
Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal Dataset
Pose as a Modality: A Psychology-Inspired Network for Personality Recognition with a New Multimodal Dataset
Bin Tang
Keqi Pan
Miao Zheng
Ning Zhou
Jialu Sui
Dandan Zhu
Cheng-Long Deng
Shu-Guang Kuai
36
0
0
17 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
X. Wang
Mingqi Jiang
Z. Ma
Ziyu Zhang
S. Liu
...
Zhifei Li
Xie Chen
Lei Xie
Y. Guo
Wei Xue
70
9
0
03 Mar 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Z. Liu
Shuangrui Ding
Zhixiong Zhang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Y. Cao
D. Lin
Jiaqi Wang
74
0
0
18 Feb 2025
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
J. Hu
Zuchao Li
Mengjia Shen
Haojun Ai
Sheng Li
Jun Zhang
24
0
0
20 Jan 2025
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
  Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
X. Zhang
K. Chen
Yu Qiao
D. Lin
Jiaqi Wang
KELM
84
12
0
12 Dec 2024
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep
  Language Posterior Injection
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection
Tzu-Ting Yang
Hsin-Wei Wang
Yi-Cheng Wang
Berlin Chen
73
0
0
26 Nov 2024
Large Generative Model-assisted Talking-face Semantic Communication
  System
Large Generative Model-assisted Talking-face Semantic Communication System
Feibo Jiang
Siwei Tu
Li Dong
Cunhua Pan
Jiangzhou Wang
Xiaohu You
21
1
0
06 Nov 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow
  Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
25
50
0
09 Oct 2024
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
Jiliang Hu
Zuchao Li
Ping Wang
Haojun Ai
Lefei Zhang
Hai Zhao
16
0
0
01 Oct 2024
Early Joint Learning of Emotion Information Makes MultiModal Model
  Understand You Better
Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better
Mengying Ge
Mingyang Li
Dongkai Tang
Pengbo Li
Kuo Liu
Shuhao Deng
Songbai Pu
L. Liu
Yang Song
Tao Zhang
18
0
0
12 Sep 2024
Integrating Audio, Visual, and Semantic Information for Enhanced
  Multimodal Speaker Diarization
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Luyao Cheng
Hui Wang
Siqi Zheng
Yafeng Chen
Rongjie Huang
Qinglin Zhang
Qian Chen
Xihao Li
23
1
0
22 Aug 2024
MooER: LLM-based Speech Recognition and Translation Models from Moore
  Threads
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads
Junhao Xu
Zhenlin Liang
Yi Liu
Yichao Hu
Jian Li
Yajun Zheng
Meng Cai
Hua Wang
AuLLM
24
1
0
09 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
31
1
0
01 Aug 2024
Qwen2-Audio Technical Report
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLM
VLM
32
100
0
15 Jul 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for
  Natural Interaction Between Humans and LLMs
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Keyu An
Qian Chen
Chong Deng
Zhihao Du
Changfeng Gao
...
Bin Zhang
Qinglin Zhang
Shiliang Zhang
Nan Zhao
Siqi Zheng
AuLLM
27
42
0
04 Jul 2024
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End
  Multi-Accent Speech Recognition
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
Jinming Chen
Jingyi Fang
Yuanzhong Zheng
Yaoxuan Wang
Haojun Fei
16
1
0
03 Jul 2024
MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword
  Spotting
MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting
Zhiqi Ai
Zhiyong Chen
Shugong Xu
19
2
0
11 Jun 2024
Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large
  Language Models
Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models
Ziyun Cui
Chang Lei
Wen Wu
Yinan Duan
Diyang Qu
Ji Wu
Runsen Chen
Chao Zhang
15
2
0
06 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
J. Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Y. Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
36
74
0
04 Jun 2024
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
...
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
DiffM
SLR
108
5
0
27 May 2024
Learning Expressive Disentangled Speech Representations with Soft Speech
  Units and Adversarial Style Augmentation
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
16
0
0
01 May 2024
An Effective Mixture-Of-Experts Approach For Code-Switching Speech
  Recognition Leveraging Encoder Disentanglement
An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement
Tzu-Ting Yang
Hsin-Wei Wang
Yi-Cheng Wang
Chi-Han Lin
Berlin Chen
19
6
0
27 Feb 2024
Speech Translation with Speech Foundation Models and Large Language
  Models: What is There and What is Missing?
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
38
11
0
19 Feb 2024
Advancing VAD Systems Based on Multi-Task Learning with Improved Model
  Structures
Advancing VAD Systems Based on Multi-Task Learning with Improved Model Structures
Lingyun Zuo
Keyu An
Shiliang Zhang
Zhijie Yan
18
1
0
19 Dec 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
28
263
0
14 Nov 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
23
79
0
07 Oct 2023
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Exploring RWKV for Memory Efficient and Low Latency Streaming ASR
Keyu An
Shiliang Zhang
11
4
0
26 Sep 2023
The second multi-channel multi-party meeting transcription challenge
  (M2MeT) 2.0): A benchmark for speaker-attributed ASR
The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR
Yuhao Liang
Mohan Shi
Fan Yu
Yangze Li
Shiliang Zhang
...
Jian Wu
Zhuo Chen
Kong Aik Lee
Zhijie Yan
Hui Bu
11
5
0
24 Sep 2023
Improving Speaker Diarization using Semantic Information: Joint Pairwise
  Constraints Propagation
Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Luyao Cheng
Siqi Zheng
Qinglin Zhang
Haibo Wang
Yafeng Chen
Qian Chen
Shiliang Zhang
33
2
0
19 Sep 2023
Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech
  Using Consistent Diffusion Models
Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models
Heyang Xue
Shuai Guo
Pengcheng Zhu
Mengxiao Bi
DiffM
14
1
0
21 Aug 2023
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and
  Effective Hotword Customization Ability
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Xian Shi
Yexin Yang
Zerui Li
Yanni Chen
Zhifu Gao
Shiliang Zhang
14
11
0
07 Aug 2023
Fast and parallel decoding for transducer
Fast and parallel decoding for transducer
Wei Kang
Liyong Guo
Fangjun Kuang
Long Lin
Mingshuang Luo
Zengwei Yao
Xiaoyu Yang
Piotr Żelasko
Daniel Povey
AI4TS
19
15
0
31 Oct 2022
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers
  for Streaming Speech Recognition
ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Martin H. Radfar
Rohit Barnwal
R. Swaminathan
Feng-Ju Chang
Grant P. Strimel
Nathan Susanj
Athanasios Mouchtaris
13
13
0
29 Sep 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
25
24
0
20 May 2022
Controllable Time-Delay Transformer for Real-Time Punctuation Prediction
  and Disfluency Detection
Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection
Qian Chen
Mengzhe Chen
Bo Li
Wen Wang
31
34
0
03 Mar 2020
1