ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.07217
53
0

ReelWave: A Multi-Agent Framework Toward Professional Movie Sound Generation

10 March 2025
Zixuan Wang
Chi-Keung Tang
Yu-Wing Tai
    DiffM
    VGen
ArXivPDFHTML
Abstract

Film production is an important application for generative audio, where richer context is provided through multiple scenes. In ReelWave, we propose a multi-agent framework for audio generation inspired by the professional movie production process. We first capture semantic and temporal synchronized "on-screen" sound by training a prediction model that predicts three interpretable time-varying audio control signals comprising loudness, pitch, and timbre. These three parameters are subsequently specified as conditions by a cross-attention module. Then, our framework infers "off-screen" sound to complement the generation through cooperative interaction between communicative agents. Each agent takes up specific roles similar to the movie production team and is supervised by an agent called the director. Besides, we investigate when the conditional video consists of multiple scenes, a case frequently seen in videos extracted from movies of considerable length. Consequently, our framework can capture a richer context of audio generation conditioned on video clips extracted from movies.

View on arXiv
@article{wang2025_2503.07217,
  title={ ReelWave: A Multi-Agent Framework Toward Professional Movie Sound Generation },
  author={ Zixuan Wang and Chi-Keung Tang and Yu-Wing Tai },
  journal={arXiv preprint arXiv:2503.07217},
  year={ 2025 }
}
Comments on this paper