ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.07875
  4. Cited By
Libri-Light: A Benchmark for ASR with Limited or No Supervision

Libri-Light: A Benchmark for ASR with Limited or No Supervision

17 December 2019
Jacob Kahn
M. Rivière
Weiyi Zheng
Evgeny Kharitonov
Qiantong Xu
Pierre-Emmanuel Mazaré
Julien Karadayi
Vitaliy Liptchinsky
R. Collobert
Christian Fuegen
Tatiana Likhomanenko
Gabriel Synnaeve
Armand Joulin
Abdel-rahman Mohamed
Emmanuel Dupoux
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "Libri-Light: A Benchmark for ASR with Limited or No Supervision"

50 / 475 papers shown
Title
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
S. Araki
H. Bredin
31
0
0
13 Jun 2025
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
SLICK: Selective Localization and Instance Calibration for Knowledge-Enhanced Car Damage Segmentation in Automotive Insurance
Teerapong Panboonyuen
142
0
0
12 Jun 2025
Spectral Domain Neural Reconstruction for Passband FMCW Radars
Harshvardhan Takawale
Nirupam Roy
13
0
0
09 Jun 2025
Speech Recognition on TV Series with Video-guided Post-Correction
Speech Recognition on TV Series with Video-guided Post-Correction
Haoyuan Yang
Yue Zhang
Liqiang Jing
13
0
0
08 Jun 2025
Analyzing the Importance of Blank for CTC-Based Knowledge Distillation
Analyzing the Importance of Blank for CTC-Based Knowledge Distillation
Benedikt Hilmes
Nick Rossenbach
Ralf Schluter
52
0
0
02 Jun 2025
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Yakun Song
Jiawei Chen
Xiaobin Zhuang
Chenpeng Du
Ziyang Ma
...
Dongya Jia
Zhuo Chen
Yuping Wang
Yuxuan Wang
Xie Chen
25
0
0
31 May 2025
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
The Warmup Dilemma: How Learning Rate Strategies Impact Speech-to-Text Model Convergence
Marco Gaido
Sara Papi
L. Bentivogli
Alessio Brutti
Mauro Cettolo
R. Gretter
M. Matassoni
Mohamed Nabih
Matteo Negri
31
0
0
29 May 2025
Spoken Language Modeling with Duration-Penalized Self-Supervised Units
Spoken Language Modeling with Duration-Penalized Self-Supervised Units
Nicol Visser
Herman Kamper
44
0
0
29 May 2025
StressTest: Can YOUR Speech LM Handle the Stress?
StressTest: Can YOUR Speech LM Handle the Stress?
Iddo Yosha
Gallil Maimon
Yossi Adi
34
0
0
28 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
165
0
0
23 May 2025
SEED: Speaker Embedding Enhancement Diffusion Model
SEED: Speaker Embedding Enhancement Diffusion Model
KiHyun Nam
Jungwoo Heo
Jee-weon Jung
Gangin Park
Chaeyoung Jung
Ha-Jin Yu
Joon Son Chung
DiffM
54
0
0
22 May 2025
Bridging Speech Emotion Recognition and Personality: Dataset and Temporal Interaction Condition Network
Bridging Speech Emotion Recognition and Personality: Dataset and Temporal Interaction Condition Network
Yuan Gao
Hao Shi
Yahui Fu
Chenhui Chu
Tatsuya Kawahara
50
0
0
20 May 2025
Single-Channel Target Speech Extraction Utilizing Distance and Room Clues
Single-Channel Target Speech Extraction Utilizing Distance and Room Clues
Runwu Shi
Zirui Lin
Benjamin Yen
Jiang Wang
Ragib Amin Nihal
Kazuhiro Nakadai
3DV
103
0
0
20 May 2025
Granary: Speech Recognition and Translation Dataset in 25 European Languages
Granary: Speech Recognition and Translation Dataset in 25 European Languages
Nithin Rao Koluguri
Monica Sekoyan
George Zelenfroynd
Sasha Meister
Shuoyang Ding
...
Yifan Peng
Sara Papi
Marco Gaido
Alessio Brutti
Boris Ginsburg
46
0
0
19 May 2025
Introducing voice timbre attribute detection
Introducing voice timbre attribute detection
Jinghao He
Zhengyan Sheng
Liping Chen
Kong Aik Lee
Zhen-Hua Ling
54
1
0
14 May 2025
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji
Tianle Liang
Yongqian Li
Jialong Zuo
Minghui Fang
...
Xize Cheng
Siqi Zheng
Jin Xu
Junyang Lin
Zhou Zhao
AuLLMALM
119
0
0
14 May 2025
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
Zhengyan Sheng
Jinghao He
Liping Chen
Kong Aik Lee
Zhen-Hua Ling
55
0
0
14 May 2025
A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration
A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration
Shaja Arul Selvamani
Nia D'Souza Ganapathy
AI4CE
122
0
0
08 May 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang
Yijiao Wang
Chaoren Wang
Zehan Li
Zhuo Chen
Zhizheng Wu
324
0
0
07 May 2025
fastabx: A library for efficient computation of ABX discriminability
fastabx: A library for efficient computation of ABX discriminability
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
41
0
0
05 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
112
0
0
01 May 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
126
0
0
21 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
Shixuan Liu
Jiajian Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
111
1
0
14 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
121
1
0
03 Apr 2025
UniSep: Universal Target Audio Separation with Language Models at Scale
UniSep: Universal Target Audio Separation with Language Models at Scale
Yun Wang
Hangting Chen
Dongchao Yang
Weiqin Li
Dan Luo
Guangzhi Li
Shan Yang
Zhiyong Wu
Helen Meng
Xixin Wu
VLM
79
1
0
31 Mar 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan Lu
SSL
144
1
0
15 Mar 2025
Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels
Santiago Cuervo
Adel Moumen
Yanis Labrak
Sameer Khurana
Antoine Laurent
Mickael Rouvier
R. Marxer
136
1
0
08 Mar 2025
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
Lucas Block Medin
Thomas Pellegrini
Lucile Gelin
SSL
83
2
0
06 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
96
1
0
02 Mar 2025
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook
Yiheng Jiang
Qian Chen
Shengpeng Ji
Yu Xi
Wen Wang
Chuxu Zhang
Xianghu Yue
Shiliang Zhang
Haoyang Li
98
1
0
27 Feb 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Yanzhe Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
146
0
0
26 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
133
0
0
21 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
87
3
0
19 Feb 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
165
4
0
05 Feb 2025
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
86
0
0
01 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yansen Wang
Kai Chen
Pengyuan Zhang
Zhikai Wu
AuLLM
139
5
0
28 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jing Zhang
Lu Lu
Yansen Wang
Haizhou Li
Zhikai Wu
AuLLM
173
25
0
17 Jan 2025
Distance Based Single-Channel Target Speech Extraction
Distance Based Single-Channel Target Speech Extraction
Runwu Shi
Benjamin Yen
Kazuhiro Nakadai
52
2
0
31 Dec 2024
Autoregressive Speech Synthesis with Next-Distribution Prediction
Autoregressive Speech Synthesis with Next-Distribution Prediction
Xinfa Zhu
WenJie Tian
Lei Xie
VLM
242
5
0
22 Dec 2024
Speaker Emotion Recognition: Leveraging Self-Supervised Models for
  Feature Extraction Using Wav2Vec2 and HuBERT
Speaker Emotion Recognition: Leveraging Self-Supervised Models for Feature Extraction Using Wav2Vec2 and HuBERT
Pourya Jafarzadeh
Amir Mohammad Rostami
Padideh Choobdar
115
3
0
05 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
112
0
0
31 Oct 2024
Speech is More Than Words: Do Speech-to-Text Translation Systems
  Leverage Prosody?
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
Ioannis Tsiamas
Matthias Sperber
Andrew Finch
Sarthak Garg
61
1
0
31 Oct 2024
An Empirical Analysis of Speech Self-Supervised Learning at Multiple
  Resolutions
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions
Theo Clark
Benedetta Cevoli
Eloy de Jong
Timofey Abramski
Jamie Dougherty
SSL
71
0
0
31 Oct 2024
From Babble to Words: Pre-Training Language Models on Continuous Streams
  of Phonemes
From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes
Zébulon Goriely
Richard Diehl Martinez
Andrew Caines
Lisa Beinborn
P. Buttery
CLL
98
5
0
30 Oct 2024
A Closer Look at Neural Codec Resynthesis: Bridging the Gap between
  Codec and Waveform Generation
A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation
Alexander H. Liu
Qirui Wang
Yuan Gong
James Glass
61
0
0
29 Oct 2024
End-to-End Integration of Speech Emotion Recognition with Voice Activity
  Detection using Self-Supervised Learning Features
End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
Natsuo Yamashita
Masaaki Yamamoto
Yohei Kawaguchi
75
0
0
17 Oct 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech
  Representation Learning
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Ashish Seth
Ramaneswaran Selvakumar
S. Sakshi
Sonal Kumar
Sreyan Ghosh
Dinesh Manocha
80
0
0
17 Oct 2024
Sound Check: Auditing Audio Datasets
Sound Check: Auditing Audio Datasets
William Agnew
Julia Barnett
Annie Chu
Rachel Hong
Michael Feffer
Robin Netzorg
Harry H. Jiang
Ezra Awumey
Sauvik Das
115
1
0
17 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech
  Processing
Investigation of Speaker Representation for Target-Speaker Speech Processing
Takanori Ashihara
Takafumi Moriya
Shota Horiguchi
Junyi Peng
Tsubasa Ochiai
Marc Delcroix
Kohei Matsuura
Hiroshi Sato
50
1
0
15 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
88
4
0
09 Oct 2024
1234...8910
Next