ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.13438
  4. Cited By
High Fidelity Neural Audio Compression

High Fidelity Neural Audio Compression

24 October 2022
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
ArXivPDFHTML

Papers citing "High Fidelity Neural Audio Compression"

50 / 85 papers shown
Title
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
E. Chng
31
0
0
12 May 2025
Toward a Sparse and Interpretable Audio Codec
Toward a Sparse and Interpretable Audio Codec
John Vinyard
24
0
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
Language translation, and change of accent for speech-to-speech task using diffusion model
Language translation, and change of accent for speech-to-speech task using diffusion model
Abhishek Mishra
Ritesh Sur Chowdhury
Vartul Bahuguna
Isha Pandey
Ganesh Ramakrishnan
DiffM
44
0
0
04 May 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Z. Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
51
0
0
07 Apr 2025
LoopGen: Training-Free Loopable Music Generation
LoopGen: Training-Free Loopable Music Generation
Davide Marincione
Giorgio Strano
Donato Crisostomi
Roberto Ribuoli
Emanuele Rodolà
MGen
53
0
0
06 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao W. Wang
Songruoyao Wu
Jiaxing Yu
K. Zhang
MGen
VGen
70
1
0
01 Apr 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan-Heng Lu
SSL
83
0
0
15 Mar 2025
Designing Neural Synthesizers for Low-Latency Interaction
Designing Neural Synthesizers for Low-Latency Interaction
Franco Caspe
Jordie Shier
Mark Sandler
C. Saitis
Andrew Mcpherson
141
0
0
14 Mar 2025
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
Kyungsu Kim
Junghyun Koo
Sungho Lee
Haesun Joung
Kyogu Lee
53
0
0
13 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
59
2
0
07 Feb 2025
DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale
DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale
Ziyang Zheng
Shan Huang
Jianyuan Zhong
Zhengyuan Shi
Guohao Dai
Ningyi Xu
Qiang Xu
GNN
89
2
0
02 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
72
10
0
28 Jan 2025
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
Junan Zhang
Jing Yang
Zihao Fang
Y. Wang
Zehua Zhang
Zhuo Wang
Fan Fan
Z. Wu
41
2
0
26 Jan 2025
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
Ünal Ege Gaznepoglu
Nils Peters
83
0
0
22 Jan 2025
SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling
SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling
Shengshi Yao
Jincheng Dai
Xiaoqi Qin
Sixian Wang
Siye Wang
K. Niu
Ping Zhang
31
0
0
22 Jan 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
Ruben Ciranni
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Giorgio Fabbro
Emanuele Rodolà
Luca Cosmo
67
7
0
10 Jan 2025
Learning the Language of Protein Structure
Learning the Language of Protein Structure
Benoit Gaujac
Jérémie Donà
Liviu Copoiu
Timothy Atkinson
Thomas Pierrot
Thomas D. Barrett
55
10
0
08 Jan 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
84
3
0
03 Jan 2025
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case of Study in Tabletop Role-Playing Games Soundtracks
Felipe Marra
Lucas N. Ferreira
26
0
0
06 Nov 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Bohan Li
Hankun Wang
Situo Zhang
Yiwei Guo
Kai Yu
35
5
0
29 Oct 2024
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
AuLLM
VLM
65
3
0
20 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
45
2
0
16 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
57
0
0
14 Oct 2024
Code Drift: Towards Idempotent Neural Audio Codecs
Code Drift: Towards Idempotent Neural Audio Codecs
P. O'Reilly
Prem Seetharaman
Jiaqi Su
Zeyu Jin
Bryan Pardo
116
0
0
14 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
125
2
0
09 Oct 2024
Variable Bitrate Residual Vector Quantization for Audio Coding
Variable Bitrate Residual Vector Quantization for Audio Coding
Yunkee Chae
Woosung Choi
Yuhta Takida
Junghyun Koo
Yukara Ikemiya
...
K. Cheuk
Marco A. Martínez Ramírez
Kyogu Lee
Wei-Hsiang Liao
Yuki Mitsufuji
74
0
0
08 Oct 2024
Presto! Distilling Steps and Layers for Accelerating Music Generation
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack
Ge Zhu
Jonah Casebeer
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
45
5
0
07 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
43
4
0
04 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
59
14
0
01 Oct 2024
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
N. Pia
Martin Strauss
M. Multrus
B. Edler
37
0
0
26 Sep 2024
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Yuanchao Li
Zixing Zhang
Jing Han
P. Bell
Catherine Lai
63
0
0
25 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
48
3
0
23 Sep 2024
Adaptive Large Language Models By Layerwise Attention Shortcuts
Adaptive Large Language Models By Layerwise Attention Shortcuts
Prateek Verma
Mert Pilanci
KELM
OffRL
50
0
0
17 Sep 2024
Learning Source Disentanglement in Neural Audio Codec
Learning Source Disentanglement in Neural Audio Codec
Xiaoyu Bie
Xubo Liu
Gaël Richard
18
1
0
17 Sep 2024
Salmon: A Suite for Acoustic Language Model Evaluation
Salmon: A Suite for Acoustic Language Model Evaluation
Gallil Maimon
Amit Roth
Yossi Adi
ELM
AuLLM
49
5
0
11 Sep 2024
InstructSing: High-Fidelity Singing Voice Generation via Instructing
  Yourself
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
Chang Zeng
Chunhui Wang
Xiaoxiao Miao
Jian Zhao
Zhonglin Jiang
Yong Chen
33
0
0
10 Sep 2024
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio
  Captioning
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Minjeong Jeon
Sang Hoon Woo
Jinjoo Lee
24
1
0
02 Sep 2024
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Justin Lovelace
Soham Ray
Kwangyoun Kim
Kilian Q. Weinberger
Felix Wu
34
2
0
01 Sep 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
54
33
0
29 Aug 2024
Advancing Spatio-Temporal Processing in Spiking Neural Networks through Adaptation
Advancing Spatio-Temporal Processing in Spiking Neural Networks through Adaptation
Maximilian Baronig
Romain Ferrand
Silvester Sabathiel
R. Legenstein
40
4
0
14 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
27
4
0
12 Aug 2024
Nested Music Transformer: Sequentially Decoding Compound Tokens in
  Symbolic Music and Audio Generation
Nested Music Transformer: Sequentially Decoding Compound Tokens in Symbolic Music and Audio Generation
Michael Kolle
Maximilian Zorn
Jongmin Jung
Dasaem Jeong
31
0
0
02 Aug 2024
Generating Sample-Based Musical Instruments Using Neural Audio Codec
  Language Models
Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models
S. Nercessian
Johannes Imort
Ninon Devis
Frederik Blang
36
1
0
22 Jul 2024
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference
  Optimization
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Yuchen Hu
Chen Chen
Siyin Wang
Eng Siong Chng
C. Zhang
43
3
0
02 Jul 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic
  Alignment
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
50
11
0
25 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
34
9
0
15 Jun 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with
  Progressive Constraints in a Dual-mode Training Strategy
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
Linhan Ma
Xinfa Zhu
Yuanjun Lv
Zhichao Wang
Ziqian Wang
Wendi He
Hongbin Zhou
Lei Xie
39
2
0
14 Jun 2024
PolySpeech: Exploring Unified Multitask Speech Models for
  Competitiveness with Single-task Models
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
Runyan Yang
Huibao Yang
Xiqing Zhang
Tiantian Ye
Ying Liu
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
34
0
0
12 Jun 2024
AudioMarkBench: Benchmarking Robustness of Audio Watermarking
AudioMarkBench: Benchmarking Robustness of Audio Watermarking
Hongbin Liu
Moyang Guo
Zhengyuan Jiang
Lun Wang
Neil Zhenqiang Gong
36
6
0
11 Jun 2024
12
Next