Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.06546
Cited By
High-Fidelity Audio Compression with Improved RVQGAN
11 June 2023
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"High-Fidelity Audio Compression with Improved RVQGAN"
50 / 202 papers shown
Title
Estimating Musical Surprisal in Audio
Mathias Rose Bjare
Giorgia Cantisani
Stefan Lattner
Gerhard Widmer
39
0
0
13 Jan 2025
MathReader : Text-to-Speech for Mathematical Documents
Sieun Hyeon
Kyudan Jung
N. Kim
Hyun Gon Ryu
Jaeyoung Do
36
1
0
13 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
36
0
0
10 Jan 2025
Apollo: Band-sequence Modeling for High-Quality Audio Restoration
Kai Li
Yi Luo
31
0
0
08 Jan 2025
Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs
Minje Kim
Jan Skoglund
39
1
0
08 Jan 2025
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Chenyu Yang
Shuai Wang
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Y. Xu
Yizhi Zhou
Haina Zhu
H. Li
KELM
121
1
0
18 Dec 2024
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
X. Li
Kai Qiu
H. Chen
Jason Kuen
Jiuxiang Gu
J. Wang
Zhe-nan Lin
Bhiksha Raj
VLM
117
3
0
02 Dec 2024
FreeCodec: A disentangled neural speech codec with fewer tokens
Youqiang Zheng
Weiping Tu
Yueteng Kang
Jie Chen
Yike Zhang
Li Xiao
Yuhong Yang
Long Ma
67
1
0
02 Dec 2024
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024
Shuoyi Zhou
Yixuan Zhou
Weiqing Li
Jun Chen
Runchuan Ye
Weihao Wu
Zijian Lin
Shun Lei
Zhiyong Wu
97
1
0
02 Dec 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
75
9
0
29 Nov 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
85
3
0
23 Nov 2024
Compression of Higher Order Ambisonics with Multichannel RVQGAN
Toni Hirvonen
Mahmoud Namazi
70
0
0
18 Nov 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Y. Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
35
10
0
02 Nov 2024
MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios
Xiao-Hang Jiang
Yang Ai
Rui Zheng
Hui-Peng Du
Ye-Xin Lu
Zhen-Hua Ling
48
0
0
01 Nov 2024
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
Axel Roebel
31
1
0
30 Oct 2024
A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation
Alexander H. Liu
Qirui Wang
Yuan Gong
James Glass
25
0
0
29 Oct 2024
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
K R Prajwal
Bowen Shi
Matthew Lee
Apoorv Vyas
Andros Tjandra
...
Baishan Guo
Huiyu Wang
Triantafyllos Afouras
David Kant
Wei-Ning Hsu
35
5
0
27 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
31
1
0
21 Oct 2024
Residual vector quantization for KV cache compression in large language model
Ankur Kumar
MQ
29
0
0
21 Oct 2024
SNAC: Multi-Scale Neural Audio Codec
Hubert Siuzdak
Florian Grötschla
Luca A. Lanzendörfer
12
10
0
18 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
43
2
0
16 Oct 2024
Code Drift: Towards Idempotent Neural Audio Codecs
P. O'Reilly
Prem Seetharaman
Jiaqi Su
Zeyu Jin
Bryan Pardo
66
0
0
14 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
16
0
0
09 Oct 2024
Variable Bitrate Residual Vector Quantization for Audio Coding
Yunkee Chae
Woosung Choi
Yuhta Takida
Junghyun Koo
Yukara Ikemiya
...
K. Cheuk
Marco A. Martínez Ramírez
Kyogu Lee
Wei-Hsiang Liao
Yuki Mitsufuji
74
0
0
08 Oct 2024
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack
Ge Zhu
Jonah Casebeer
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
45
5
0
07 Oct 2024
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
Yuto Nishimura
Takumi Hirose
Masanari Ohi
Hideki Nakayama
Nakamasa Inoue
VLM
29
1
0
06 Oct 2024
Graded Suspiciousness of Adversarial Texts to Human
Shakila Mahjabin Tonni
Pedro Faustini
Mark Dras
AAML
21
0
0
06 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Alan Baade
Puyuan Peng
David F. Harwath
45
3
0
05 Oct 2024
NTU-NPU System for Voice Privacy 2024 Challenge
Nikita Kuzmin
Hieu-Thi Luong
Jixun Yao
Lei Xie
Kong Aik Lee
Eng Siong Chng
49
1
0
03 Oct 2024
ImageFolder: Autoregressive Image Generation with Folded Tokens
Xiang Li
Kai Qiu
Hao Chen
Jason Kuen
Jiuxiang Gu
Bhiksha Raj
Zhe-nan Lin
VLM
34
18
0
02 Oct 2024
Zero-Shot Text-to-Speech from Continuous Text Streams
Trung D. Q. Dang
David Aponte
Dung Tran
Tianyi Chen
K. Koishida
AuLLM
VLM
32
3
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
59
14
0
01 Oct 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
41
1
0
28 Sep 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
28
6
0
27 Sep 2024
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
N. Pia
Martin Strauss
M. Multrus
B. Edler
26
0
0
26 Sep 2024
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Pin-Jui Ku
Alexander H. Liu
Roman Korostik
Sung-Feng Huang
Szu-Wei Fu
Ante Jukić
36
2
0
24 Sep 2024
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kaiwei Chang
Jiawei Du
...
Yi-Chiao Wu
Xu Tan
James Glass
Shinji Watanabe
Hung-yi Lee
24
6
0
21 Sep 2024
Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
VGen
32
9
0
20 Sep 2024
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
Lauri Juvela
Xin Eric Wang
21
2
0
20 Sep 2024
MuCodec: Ultra Low-Bitrate Music Codec
Yaoxun Xu
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Shun Lei
Zhiwei Lin
Zhiyong Wu
30
1
0
20 Sep 2024
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization
Zhikang Niu
Sanyuan Chen
Long Zhou
Ziyang Ma
Xie Chen
Shujie Liu
29
2
0
19 Sep 2024
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
Edresson Casanova
Ryan Langman
Paarth Neekhara
Shehzeen Samarah Hussain
Jason Chun Lok Li
Subhankar Ghosh
Ante Jukić
Sang-gil Lee
AuLLM
29
2
0
18 Sep 2024
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
Haohan Guo
Fenglong Xie
Dongchao Yang
Xixin Wu
Helen Meng
36
1
0
18 Sep 2024
Learning Source Disentanglement in Neural Audio Codec
Xiaoyu Bie
Xubo Liu
Gaël Richard
18
1
0
17 Sep 2024
FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models
Luca Comanducci
Paolo Bestagini
Stefano Tubaro
35
6
0
16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
29
4
0
16 Sep 2024
Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility
Xiaoyu Liu
Xu Li
Joan Serra
Santiago Pascual
29
3
0
14 Sep 2024
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation
Ye Bai
Haonan Chen
Jitong Chen
Zhuo Chen
Yi Deng
...
Hang Zhao
Ziyi Zhao
Dejian Zhong
Shicen Zhou
Pei Zou
DiffM
58
6
0
13 Sep 2024
OpenACE: An Open Benchmark for Evaluating Audio Coding Performance
Jozef Coldenhoff
Niclas Granqvist
Milos Cernak
18
0
0
12 Sep 2024
Multi-Source Music Generation with Latent Diffusion
Zhongweiyang Xu
Debottam Dutta
Yu-Lin Wei
Romit Roy Choudhury
DiffM
40
1
0
10 Sep 2024
Previous
1
2
3
4
5
Next