ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.08791
  4. Cited By
Taming Visually Guided Sound Generation

Taming Visually Guided Sound Generation

17 October 2021
Vladimir E. Iashin
Esa Rahtu
    VLM
ArXivPDFHTML

Papers citing "Taming Visually Guided Sound Generation"

50 / 93 papers shown
Title
OmniAudio: Generating Spatial Audio from 360-Degree Video
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
X. Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
51
0
0
21 Apr 2025
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
Tri Ton
Ji Woo Hong
Chang D. Yoo
VGen
24
0
0
08 Apr 2025
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Haomin Zhang
Chang Liu
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
83
0
0
28 Mar 2025
DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
Yunming Liang
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
55
0
0
28 Mar 2025
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Haomin Zhang
S.
Haoyu Wang
Zihao Chen
X. Liu
Chaofan Ding
Xinhan Di
31
0
0
28 Mar 2025
Chirp Localization via Fine-Tuned Transformer Model: A Proof-of-Concept Study
Chirp Localization via Fine-Tuned Transformer Model: A Proof-of-Concept Study
N. Bahador
M. Lankarany
39
0
0
24 Mar 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
50
0
0
15 Mar 2025
Long-Video Audio Synthesis with Multi-Agent Collaboration
Long-Video Audio Synthesis with Multi-Agent Collaboration
Yehang Zhang
Xinli Xu
Xiaojie Xu
L. Liu
Y. Chen
DiffM
VGen
48
0
0
13 Mar 2025
TA-V2A: Textually Assisted Video-to-Audio Generation
Yuhuan You
Xihong Wu
T. Qu
DiffM
45
0
0
12 Mar 2025
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
Juncheng Wang
Chao Xu
Cheng Yu
Lei Shang
Zhe Hu
Shujun Wang
Liefeng Bo
DiffM
VGen
43
0
0
10 Mar 2025
ReelWave: A Multi-Agent Framework Toward Professional Movie Sound Generation
Zixuan Wang
Chi-Keung Tang
Yu-Wing Tai
DiffM
VGen
58
0
0
10 Mar 2025
Generative AI for Cel-Animation: A Survey
Generative AI for Cel-Animation: A Survey
Yunlong Tang
Junjia Guo
Pinxin Liu
Zhiyuan Wang
Hang Hua
...
Jing Bi
Mingqian Feng
X. Li
Zeliang Zhang
Chenliang Xu
VGen
88
7
0
08 Jan 2025
LoVA: Long-form Video-to-Audio Generation
LoVA: Long-form Video-to-Audio Generation
Xin Cheng
Xihua Wang
Yihan Wu
Yuyue Wang
Ruihua Song
VGen
DiffM
38
2
0
31 Dec 2024
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation
  Under Semantic Guidance
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance
Yaoyun Zhang
Xuenan Xu
Mengyue Wu
VGen
26
0
0
24 Dec 2024
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
A. Schwing
Yuki Mitsufuji
VGen
126
12
0
19 Dec 2024
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha
Yapeng Tian
DiffM
VGen
71
2
0
14 Dec 2024
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal
  Latent Alignment
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
68
0
0
09 Dec 2024
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from
  Text
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
Haohe Liu
Gaël Le Lan
Xinhao Mei
Zhaoheng Ni
Anurag Kumar
Varun K. Nagaraja
Wenwu Wang
Mark D. Plumbley
Yangyang Shi
Vikas Chandra
VGen
61
1
0
03 Dec 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
85
3
0
23 Nov 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic
  Synchronization
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
63
7
0
16 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
41
4
0
04 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
36
3
0
03 Oct 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual
  Representation and Generation
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
28
6
0
27 Sep 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
63
4
0
26 Sep 2024
Self-Supervised Audio-Visual Soundscape Stylization
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
30
4
0
22 Sep 2024
Temporally Aligned Audio for Video with Autoregression
Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
VGen
32
9
0
20 Sep 2024
Efficient Video to Audio Mapper with Visual Scene Detection
Efficient Video to Audio Mapper with Visual Scene Detection
Mingjing Yi
Ming Li
VGen
15
3
0
15 Sep 2024
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for
  Target Style Audio Generation
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Chenxu Xiong
Ruibo Fu
Shuchen Shi
Zhengqi Wen
Jianhua Tao
...
Chunyu Qiang
Yuankun Xie
Xin Qi
Guanjun Li
Zizheng Yang
DiffM
31
0
0
14 Sep 2024
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In
  Video-to-Audio Synthesis
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
Zhiqi Huang
Dan Luo
Jun Wang
Huan Liao
Zhiheng Li
Zhiyong Wu
VGen
45
4
0
13 Sep 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGen
DiffM
43
6
0
13 Sep 2024
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music
  Videos
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
Yan-Bo Lin
Yu Tian
L. Yang
Gedas Bertasius
Heng Wang
VGen
34
7
0
11 Sep 2024
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Qi Yang
Binjie Mao
Zili Wang
Xing Nie
Pengfei Gao
Ying Guo
Cheng Zhen
Pengfei Yan
Shiming Xiang
VGen
DiffM
30
5
0
10 Sep 2024
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
Junwon Lee
Jaekwon Im
Dabin Kim
Juhan Nam
VGen
21
9
0
21 Aug 2024
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot
  Soundscape Mapping
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping
Subash Khanal
Eric Xing
S. Sastry
A. Dhakal
Zhexiao Xiong
Adeel Ahmad
Nathan Jacobs
36
2
0
13 Aug 2024
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
Aashish Rai
Srinath Sridhar
DiffM
36
4
0
30 Jul 2024
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
Huadai Liu
Jialei Wang
Rongjie Huang
Yang Liu
Jiayang Xu
Zhou Zhao
21
4
0
18 Jul 2024
Masked Generative Video-to-Audio Transformers with Enhanced
  Synchronicity
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serra
DiffM
VGen
26
15
0
15 Jul 2024
Video-to-Audio Generation with Hidden Alignment
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
43
11
0
10 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
25
11
0
08 Jul 2024
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized
  Sounds
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
Yiming Zhang
Yicheng Gu
Yanhong Zeng
Zhening Xing
Yuancheng Wang
Zhizheng Wu
Kai Chen
VGen
21
35
0
01 Jul 2024
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for
  Efficient Audio Synthesis and Beyond
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Marco Comunità
Zhi-Wei Zhong
Akira Takahashi
Shiqi Yang
Mengjie Zhao
Koichi Saito
Yukara Ikemiya
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
47
2
0
25 Jun 2024
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric
  Videos
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen
Puyuan Peng
Ami Baid
Zihui Xue
Wei-Ning Hsu
David F. Harwath
Kristen Grauman
VGen
37
7
0
13 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
31
3
0
06 Jun 2024
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Kengo Uchida
Takashi Shibuya
Yuhta Takida
Naoki Murata
Shusuke Takahashi
Shusuke Takahashi
Yuki Mitsufuji
VGen
44
4
0
04 Jun 2024
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Yongqi Wang
Wenxiang Guo
Rongjie Huang
Jia-Bin Huang
Zehan Wang
Fuming You
Ruiqi Li
Zhou Zhao
VGen
DiffM
26
11
0
01 Jun 2024
Reverse the auditory processing pathway: Coarse-to-fine audio
  reconstruction from fMRI
Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI
Che Liu
Changde Du
Xiaoyu Chen
Huiguang He
27
2
0
29 May 2024
C3LLM: Conditional Multimodal Content Generation Using Large Language
  Models
C3LLM: Conditional Multimodal Content Generation Using Large Language Models
Zixuan Wang
Qinkai Duan
Yu-Wing Tai
Chi-Keung Tang
29
3
0
25 May 2024
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Shiqi Yang
Zhi-Wei Zhong
Mengjie Zhao
Shusuke Takahashi
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
43
2
0
23 May 2024
A Versatile Diffusion Transformer with Mixture of Noise Levels for
  Audiovisual Generation
A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation
Gwanghyun Kim
Alonso Martinez
Yu-Chuan Su
Brendan Jou
José Lezama
...
Lijun Yu
Lu Jiang
A. Jansen
Jacob Walker
Krishna Somandepalli
22
8
0
22 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
48
9
0
20 May 2024
12
Next