ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.06546
  4. Cited By
High-Fidelity Audio Compression with Improved RVQGAN

High-Fidelity Audio Compression with Improved RVQGAN

11 June 2023
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
ArXivPDFHTML

Papers citing "High-Fidelity Audio Compression with Improved RVQGAN"

50 / 202 papers shown
Title
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment
  Generation
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation
Jianyi Chen
Wei Xue
Xu Tan
Zhen Ye
Qi-fei Liu
Yi-Ting Guo
42
2
0
13 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
41
15
0
08 May 2024
HILCodec: High Fidelity and Lightweight Neural Audio Codec
HILCodec: High Fidelity and Lightweight Neural Audio Codec
S. Ahn
Beom Jun Woo
Mingrui Han
Chanyeong Moon
Nam Soo Kim
19
6
0
08 May 2024
Detecting music deepfakes is easy but actually hard
Detecting music deepfakes is easy but actually hard
Darius Afchar
Gabriel Meseguer-Brocal
Romain Hennequin
53
6
0
07 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
  Sound
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
19
18
0
30 Apr 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
32
1
0
30 Apr 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized
  Transformers
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
21
4
0
30 Apr 2024
An Investigation of Time-Frequency Representation Discriminators for
  High-Fidelity Vocoder
An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder
Yicheng Gu
Xueyao Zhang
Liumeng Xue
Haizhou Li
Zhizheng Wu
28
2
0
26 Apr 2024
Long-form music generation with latent diffusion
Long-form music generation with latent diffusion
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
MGen
DiffM
36
38
0
16 Apr 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing
  Using Discrete Speech Unit Challenge
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
Yiwei Guo
Chenrun Wang
Yifan Yang
Hankun Wang
Ziyang Ma
...
Hanzheng Li
Shuai Fan
Hui Zhang
Xie Chen
Kai Yu
28
1
0
09 Apr 2024
Gull: A Generative Multifunctional Audio Codec
Gull: A Generative Multifunctional Audio Codec
Yi Luo
Jianwei Yu
Hangting Chen
Rongzhi Gu
Chao Weng
AuLLM
27
3
0
07 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
65
39
0
03 Apr 2024
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled
  Representation Learning based Adaptive Feature-aware Prompt Encoders
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders
Yu Pan
Lei Ma
Jianjun Zhao
32
4
0
03 Apr 2024
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
Junghyun Koo
G. Wichern
François G. Germain
Sameer Khurana
Jonathan Le Roux
26
3
0
02 Apr 2024
Personalized Neural Speech Codec
Personalized Neural Speech Codec
Inseon Jang
Haici Yang
Wootaek Lim
Seung-Wha Beack
Minje Kim
37
1
0
31 Mar 2024
UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing
  Using Discrete Speech Unit Challenge
UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge
Wataru Nakata
Kazuki Yamauchi
Dong Yang
Hiroaki Hyodo
Yuki Saito
22
0
0
20 Mar 2024
MusicHiFi: Fast High-Fidelity Stereo Vocoding
MusicHiFi: Fast High-Fidelity Stereo Vocoding
Ge Zhu
Juan-Pablo Caceres
Zhiyao Duan
Nicholas J. Bryan
DiffM
19
4
0
15 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
33
143
0
05 Mar 2024
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech
  Synthesis
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Wei-wei Lin
Chenhang He
Man-Wai Mak
Jiachen Lian
Kong Aik Lee
DiffM
30
0
0
01 Mar 2024
Towards audio language modeling -- an overview
Towards audio language modeling -- an overview
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kai-Wei Chang
Ho-Lam Chung
Alexander H. Liu
Hung-yi Lee
AuLLM
30
28
0
20 Feb 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu
Ho-Lam Chung
Yi-Cheng Lin
Yuan-Kuei Wu
Xuanjun Chen
Yu-Chi Pai
Hsiu-Hsuan Wang
Kai-Wei Chang
Alexander H. Liu
Hung-yi Lee
41
18
0
20 Feb 2024
Advancing Large Language Models to Capture Varied Speaking Styles and
  Respond Properly in Spoken Conversations
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
Guan-Ting Lin
Cheng-Han Chiang
Hung-yi Lee
34
22
0
20 Feb 2024
Language-Codec: Reducing the Gaps Between Discrete Codec Representation
  and Speech Language Models
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Siqi Zheng
Qian Chen
Rongjie Huang
Jialung Zuo
Shulei Wang
Zhou Zhao
AuLLM
24
16
0
19 Feb 2024
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum
  Encoding and Decoding
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding
Yang Ai
Xiao-Hang Jiang
Ye-Xin Lu
Hui-Peng Du
Zhenhua Ling
21
20
0
16 Feb 2024
Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning
  of Music Audio
Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio
Pablo Alonso-Jiménez
L. Pepino
Roser Batlle-Roca
Pablo Zinemanas
Dmitry Bogdanov
Xavier Serra
Martín Rocamora
31
6
0
14 Feb 2024
Fast Timing-Conditioned Latent Audio Diffusion
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans
CJ Carr
Josiah Taylor
Scott H. Hawley
Jordi Pons
DiffM
74
101
0
07 Feb 2024
Natural language guidance of high-fidelity text-to-speech with synthetic
  annotations
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
Daniel Lyth
Simon King
16
35
0
02 Feb 2024
Exploring the limits of decoder-only models trained on public speech
  recognition corpora
Exploring the limits of decoder-only models trained on public speech recognition corpora
Ankit Gupta
G. Saon
Brian Kingsbury
OffRL
21
5
0
31 Jan 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
38
39
0
30 Jan 2024
Residual Quantization with Implicit Neural Codebooks
Residual Quantization with Implicit Neural Codebooks
Iris A. M. Huijben
Matthijs Douze
Matthew Muckley
Ruud J. G. van Sloun
Jakob Verbeek
MQ
19
11
0
26 Jan 2024
Exploring Musical Roots: Applying Audio Embeddings to Empower Influence
  Attribution for a Generative Music Model
Exploring Musical Roots: Applying Audio Embeddings to Empower Influence Attribution for a Generative Music Model
Julia Barnett
Hugo Flores Garcia
Bryan Pardo
32
6
0
25 Jan 2024
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Zachary Novack
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
DiffM
24
32
0
22 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
21
7
0
05 Jan 2024
StemGen: A music generation model that listens
StemGen: A music generation model that listens
Julian Parker
Janne Spijkervet
Katerina Kosta
Furkan Yesiler
Boris Kuznetsov
Ju-Chiang Wang
Matt Avent
Jitong Chen
Duc Le
MGen
12
27
0
14 Dec 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
22
31
0
21 Nov 2023
Generative De-Quantization for Neural Speech Codec via Latent Diffusion
Generative De-Quantization for Neural Speech Codec via Latent Diffusion
Haici Yang
Inseon Jang
Minje Kim
DiffM
29
6
0
14 Nov 2023
InstrumentGen: Generating Sample-Based Musical Instruments From Text
InstrumentGen: Generating Sample-Based Musical Instruments From Text
S. Nercessian
Johannes Imort
14
2
0
07 Nov 2023
SelfVC: Voice Conversion With Iterative Refinement using Self
  Transformations
SelfVC: Voice Conversion With Iterative Refinement using Self Transformations
Paarth Neekhara
Shehzeen Samarah Hussain
Rafael Valle
Boris Ginsburg
Rishabh Ranjan
Shlomo Dubnov
F. Koushanfar
Julian McAuley
13
3
0
14 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBM
AuLLM
22
114
0
01 Oct 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
19
12
0
19 Sep 2023
Fewer-token Neural Speech Codec with Time-invariant Codes
Fewer-token Neural Speech Codec with Time-invariant Codes
Yong Ren
Tao Wang
Jiangyan Yi
Le Xu
Jianhua Tao
Chuyuan Zhang
Jun Zhou
17
32
0
15 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit
  for Neural Speech Codec
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
21
54
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
24
24
0
14 Sep 2023
A Review of Differentiable Digital Signal Processing for Music & Speech
  Synthesis
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
B. Hayes
Jordie Shier
Gyorgy Fazekas
Andrew Mcpherson
C. Saitis
21
21
0
29 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
29
36
0
24 Aug 2023
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
Robin San Roman
Yossi Adi
Antoine Deleforge
Romain Serizel
Gabriel Synnaeve
Alexandre Défossez
DiffM
16
21
0
02 Aug 2023
VampNet: Music Generation via Masked Acoustic Token Modeling
VampNet: Music Generation via Masked Acoustic Token Modeling
Hugo Flores Garcia
Prem Seetharaman
Rithesh Kumar
Bryan Pardo
MGen
31
64
0
10 Jul 2023
Simple and Controllable Music Generation
Simple and Controllable Music Generation
Jade Copet
Felix Kreuk
Itai Gat
Tal Remez
David Kant
Gabriel Synnaeve
Yossi Adi
Alexandre Défossez
MGen
19
338
0
08 Jun 2023
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
Kun Su
Judith Yue Li
Qingqing Huang
Dima Kuzmin
Joonseok Lee
...
Fei Sha
A. Jansen
Yu Wang
Mauro Verzetti
Timo I. Denk
VGen
26
12
0
11 May 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
43
637
0
05 Jan 2023
Previous
12345
Next