ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.09162
  4. Cited By
MambaFoley: Foley Sound Generation using Selective State-Space Models
v1v2 (latest)

MambaFoley: Foley Sound Generation using Selective State-Space Models

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
13 September 2024
Marco Furio Colombo
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
    Mamba
ArXiv (abs)PDFHTML

Papers citing "MambaFoley: Foley Sound Generation using Selective State-Space Models"

41 / 41 papers shown
AI-Assisted Music Production: A User Study on Text-to-Music Models
AI-Assisted Music Production: A User Study on Text-to-Music Models
Francesca Ronchini
Luca Comanducci
Simone Marcucci
Fabio Antonacci
94
0
0
27 Sep 2025
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley SoundIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Junwon Lee
Jaekwon Im
Dabin Kim
Juhan Nam
VGen
471
17
0
21 Aug 2024
Hydra: Bidirectional State Space Models Through Generalized Matrix
  Mixers
Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers
Sukjun Hwang
Aakash Lahoti
Tri Dao
Albert Gu
Mamba
367
38
0
13 Jul 2024
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of
  Audio Events in Text-to-audio Generation
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Zeyu Xie
Xuenan Xu
Zhizheng Wu
Mengyue Wu
283
15
0
03 Jul 2024
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
Zeyu Xie
Xuenan Xu
Zhizheng Wu
Mengyue Wu
AuLLM
264
14
0
03 Jul 2024
Can Synthetic Audio From Generative Foundation Models Assist Audio
  Recognition and Speech Modeling?
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
Tiantian Feng
Dimitrios Dimitriadis
Shrikanth Narayanan
207
5
0
13 Jun 2024
RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake
  Detection
RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake DetectionInterspeech (Interspeech), 2024
Yujie Chen
Jiangyan Yi
Jun Xue
Chenglong Wang
Xiaohui Zhang
Shunbo Dong
Siding Zeng
Jianhua Tao
Lv Zhao
Cunhang Fan
Mamba
233
39
0
10 Jun 2024
Audio Mamba: Selective State Spaces for Self-Supervised Audio
  Representations
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
Sarthak Yadav
Zheng-Hua Tan
Mamba
265
28
0
04 Jun 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms
  Through Structured State Space Duality
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
415
1,043
0
31 May 2024
SPMamba: State-space model is all you need in speech separation
SPMamba: State-space model is all you need in speech separation
Kai Li
Guo Chen
Mamba
272
26
0
02 Apr 2024
Synthetic training set generation using text-to-audio models for
  environmental sound classification
Synthetic training set generation using text-to-audio models for environmental sound classification
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
269
2
0
26 Mar 2024
Correlation of Fréchet Audio Distance With Human Perception of
  Environmental Audio Is Embedding Dependant
Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant
Modan Tailleur
Junwon Lee
Mathieu Lagrange
Keunwoo Choi
Laurie M. Heller
Keisuke Imoto
Yuki Okamoto
290
14
0
26 Mar 2024
T-FOLEY: A Controllable Waveform-Domain Diffusion Model for
  Temporal-Event-Guided Foley Sound Synthesis
T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Yoonjin Chung
Junwon Lee
Juhan Nam
170
22
0
17 Jan 2024
Reconstruction of Sound Field through Diffusion Models
Reconstruction of Sound Field through Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
F. Miotello
Luca Comanducci
Mirco Pezzoli
Alberto Bernardini
Fabio Antonacci
Augusto Sarti
DiffM
258
20
0
14 Dec 2023
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
575
5,271
0
01 Dec 2023
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Marco Comunità
R. F. Gramaccioni
Emilian Postolache
Emanuele Rodolà
Danilo Comminiello
Joshua D. Reiss
DiffM
187
29
0
23 Oct 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
349
382
0
10 Aug 2023
Text-Driven Foley Sound Generation With Latent Diffusion Model
Text-Driven Foley Sound Generation With Latent Diffusion Model
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Peipei Wu
Mark D.Plumbley
Wenwu Wang
DiffM
493
13
0
17 Jun 2023
FALL-E: A Foley Sound Synthesis Model and Strategies
FALL-E: A Foley Sound Synthesis Model and Strategies
Minsung Kang
Sangshin Oh
Hyeongi Moon
Kyungyun Lee
Ben Sangbae Chon
227
6
0
16 Jun 2023
Foley Sound Synthesis at the DCASE 2023 Challenge
Foley Sound Synthesis at the DCASE 2023 Challenge
Keunwoo Choi
Jae-Yeol Im
Laurie M. Heller
Brian McFee
Keisuke Imoto
Yuki Okamoto
Mathieu Lagrange
Shinosuke Takamichi
308
40
0
25 Apr 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion ModelsInternational Conference on Machine Learning (ICML), 2023
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
404
431
0
30 Jan 2023
Full-band General Audio Synthesis with Score-based Diffusion
Full-band General Audio Synthesis with Score-based DiffusionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Santiago Pascual
Gautam Bhattacharya
Chunghsin Yeh
Jordi Pons
Joan Serrà
DiffM
220
39
0
26 Oct 2022
Classifier-Free Diffusion Guidance
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
476
5,341
0
26 Jul 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Diffsound: Discrete Diffusion Model for Text-to-sound GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
273
379
0
20 Jul 2022
It's Raw! Audio Generation with State-Space Models
It's Raw! Audio Generation with State-Space ModelsInternational Conference on Machine Learning (ICML), 2022
Karan Goel
Albert Gu
Chris Donahue
Christopher Ré
261
233
0
20 Feb 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State SpacesInternational Conference on Learning Representations (ICLR), 2021
Albert Gu
Karan Goel
Christopher Ré
1.0K
2,871
0
31 Oct 2021
Combining Recurrent, Convolutional, and Continuous-time Models with
  Linear State-Space Layers
Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers
Albert Gu
Isys Johnson
Karan Goel
Khaled Kamal Saab
Tri Dao
Atri Rudra
Christopher Ré
295
945
0
26 Oct 2021
Variational Diffusion Models
Variational Diffusion Models
Diederik P. Kingma
Tim Salimans
Ben Poole
Jonathan Ho
DiffM
905
1,363
0
01 Jul 2021
CRASH: Raw Audio Score-based Generative Modeling for Controllable
  High-resolution Drum Sound Synthesis
CRASH: Raw Audio Score-based Generative Modeling for Controllable High-resolution Drum Sound SynthesisInternational Society for Music Information Retrieval Conference (ISMIR), 2021
Simon Rouard
Gaëtan Hadjeres
DiffM
152
44
0
14 Jun 2021
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K: An Open Dataset of Human-Labeled Sound EventsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
512
604
0
01 Oct 2020
Denoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
5.1K
25,864
0
19 Jun 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLMSSL
459
1,341
0
21 Dec 2019
Root Mean Square Layer Normalization
Root Mean Square Layer NormalizationNeural Information Processing Systems (NeurIPS), 2019
Biao Zhang
Rico Sennrich
797
1,205
0
16 Oct 2019
Generating Long Sequences with Sparse Transformers
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
343
2,274
0
23 Apr 2019
Fréchet Audio Distance: A Metric for Evaluating Music Enhancement
  Algorithms
Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms
Kevin Kilgour
Mauricio Zuluaga
Dominik Roblek
Matthew Sharifi
1.4K
289
0
20 Dec 2018
FiLM: Visual Reasoning with a General Conditioning Layer
FiLM: Visual Reasoning with a General Conditioning Layer
Ethan Perez
Florian Strub
H. D. Vries
Vincent Dumoulin
Aaron Courville
FAttAIMatOffRLAI4CE
776
2,916
0
22 Sep 2017
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
4.2K
162,388
0
12 Jun 2017
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
SampleRNN: An Unconditional End-to-End Neural Audio Generation ModelInternational Conference on Learning Representations (ICLR), 2016
Soroush Mehri
Kundan Kumar
Ishaan Gulrajani
Rithesh Kumar
Shubham Jain
Jose M. R. Sotelo
Aaron Courville
Yoshua Bengio
337
619
0
22 Dec 2016
CNN Architectures for Large-Scale Audio Classification
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
554
2,818
0
29 Sep 2016
WaveNet: A Generative Model for Raw Audio
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
DiffM
1.0K
7,961
0
12 Sep 2016
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic OptimizationInternational Conference on Learning Representations (ICLR), 2014
Diederik P. Kingma
Jimmy Ba
ODL
4.7K
161,759
0
22 Dec 2014
1
Page 1 of 1