v1v2 (latest)

MambaFoley: Foley Sound Generation using Selective State-Space Models

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

13 September 2024

Papers citing "MambaFoley: Foley Sound Generation using Selective State-Space Models"

41 / 41 papers shown

AI-Assisted Music Production: A User Study on Text-to-Music Models

27 Sep 2025

Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley SoundIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024

471

21 Aug 2024

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

367

13 Jul 2024

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

283

03 Jul 2024

AudioTime: A Temporally-aligned Audio-text Benchmark Dataset

264

03 Jul 2024

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?

Tiantian Feng

Dimitrios Dimitriadis

Shrikanth Narayanan

207

13 Jun 2024

RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake DetectionInterspeech (Interspeech), 2024

Jiangyan Yi

Xiaohui Zhang

Jianhua Tao

Lv Zhao

Cunhang Fan

Mamba

233

10 Jun 2024

Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations

Sarthak Yadav

Zheng-Hua Tan

Mamba

265

04 Jun 2024

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao

Albert Gu

Mamba

415

1,043

31 May 2024

SPMamba: State-space model is all you need in speech separation

Kai Li

Guo Chen

Mamba

272

02 Apr 2024

Synthetic training set generation using text-to-audio models for environmental sound classification

Francesca Ronchini

Luca Comanducci

Fabio Antonacci

269

26 Mar 2024

Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant

290

26 Mar 2024

T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Yoonjin Chung

Junwon Lee

Juhan Nam

170

17 Jan 2024

Reconstruction of Sound Field through Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

258

14 Dec 2023

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu

Tri Dao

Mamba

575

5,271

01 Dec 2023

SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Joshua D. Reiss

187

23 Oct 2023

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Yuxuan Wang

349

382

10 Aug 2023

Text-Driven Foley Sound Generation With Latent Diffusion Model

493

17 Jun 2023

FALL-E: A Foley Sound Synthesis Model and Strategies

227

16 Jun 2023

Foley Sound Synthesis at the DCASE 2023 Challenge

308

25 Apr 2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion ModelsInternational Conference on Machine Learning (ICML), 2023

Rongjie Huang

Dongchao Yang

Zhou Zhao

404

431

30 Jan 2023

Full-band General Audio Synthesis with Score-based DiffusionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

220

26 Oct 2022

Classifier-Free Diffusion Guidance

Jonathan Ho

Tim Salimans

FaML

476

5,341

26 Jul 2022

Diffsound: Discrete Diffusion Model for Text-to-sound GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Dongchao Yang

Helin Wang

Dong Yu

273

379

20 Jul 2022

It's Raw! Audio Generation with State-Space ModelsInternational Conference on Machine Learning (ICML), 2022

261

233

20 Feb 2022

Efficiently Modeling Long Sequences with Structured State SpacesInternational Conference on Learning Representations (ICLR), 2021

Albert Gu

Karan Goel

Christopher Ré

1.0K

2,871

31 Oct 2021

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

295

945

26 Oct 2021

Variational Diffusion Models

905

1,363

01 Jul 2021

CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound SynthesisInternational Society for Music Information Retrieval Conference (ISMIR), 2021

Simon Rouard

Gaëtan Hadjeres

DiffM

152

14 Jun 2021

FSD50K: An Open Dataset of Human-Labeled Sound EventsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020

Xavier Serra

512

604

01 Oct 2020

Denoising Diffusion Probabilistic Models

Jonathan Ho

Ajay Jain

Pieter Abbeel

DiffM

5.1K

25,864

19 Jun 2020

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019

Yuxuan Wang

459

1,341

21 Dec 2019

Root Mean Square Layer NormalizationNeural Information Processing Systems (NeurIPS), 2019

Biao Zhang

Rico Sennrich

797

1,205

16 Oct 2019

Generating Long Sequences with Sparse Transformers

343

2,274

23 Apr 2019

Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms

1.4K

289

20 Dec 2018

FiLM: Visual Reasoning with a General Conditioning Layer

Aaron Courville

FAtt AIMat OffRL AI4CE

776

2,916

22 Sep 2017

Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017

4.2K

162,388

12 Jun 2017

SampleRNN: An Unconditional End-to-End Neural Audio Generation ModelInternational Conference on Learning Representations (ICLR), 2016

Aaron Courville

337

619

22 Dec 2016

CNN Architectures for Large-Scale Audio Classification

...

Rif A. Saurous

554

2,818

29 Sep 2016

WaveNet: A Generative Model for Raw Audio

1.0K

7,961

12 Sep 2016

Adam: A Method for Stochastic OptimizationInternational Conference on Learning Representations (ICLR), 2014

Diederik P. Kingma

Jimmy Ba

ODL

4.7K

161,759

22 Dec 2014