ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.08791
  4. Cited By
Taming Visually Guided Sound Generation

Taming Visually Guided Sound Generation

17 October 2021
Vladimir E. Iashin
Esa Rahtu
    VLM
ArXivPDFHTML

Papers citing "Taming Visually Guided Sound Generation"

43 / 93 papers shown
Title
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
  Sound
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
16
18
0
30 Apr 2024
Semantically consistent Video-to-Audio Generation using Multimodal
  Language Large Model
Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Gehui Chen
Guan’an Wang
Xiaowen Huang
Jitao Sang
VGen
11
8
0
25 Apr 2024
TAVGBench: Benchmarking Text to Audible-Video Generation
TAVGBench: Benchmarking Text to Audible-Video Generation
Yuxin Mao
Xuyang Shen
Jing Zhang
Zhen Qin
Jinxing Zhou
Mochu Xiang
Yiran Zhong
Yuchao Dai
40
11
0
22 Apr 2024
Text-to-Audio Generation Synchronized with Videos
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
37
17
0
08 Mar 2024
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion
  Latent Aligners
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing
Yin-Yin He
Zeyue Tian
Xintao Wang
Qifeng Chen
27
49
0
27 Feb 2024
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
Hila Manor
T. Michaeli
DiffM
21
25
0
15 Feb 2024
SonicVisionLM: Playing Sound with Vision Language Models
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
25
2
0
09 Jan 2024
Controllable Music Production with Diffusion Models and Guidance
  Gradients
Controllable Music Production with Diffusion Models and Guidance Gradients
Mark Levy
Bruno Di Giorgi
Floris Weers
Angelos Katharopoulos
Tom Nickson
DiffM
75
19
0
01 Nov 2023
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
Marco Comunità
R. F. Gramaccioni
Emilian Postolache
Emanuele Rodolà
Danilo Comminiello
Joshua D. Reiss
DiffM
22
16
0
23 Oct 2023
FoleyGen: Visually-Guided Audio Generation
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
16
20
0
19 Sep 2023
Retrieval-Augmented Text-to-Audio Generation
Retrieval-Augmented Text-to-Audio Generation
Yiitan Yuan
Haohe Liu
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
RALM
12
24
0
14 Sep 2023
DCTTS: Discrete Diffusion Model with Contrastive Learning for
  Text-to-speech Generation
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Zhichao Wu
Qiulin Li
Sixing Liu
Qun Yang
17
3
0
13 Sep 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by
  Connecting Foundation Models
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
16
37
0
18 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
17
220
0
10 Aug 2023
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion
  Models
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Simian Luo
Chuanhao Yan
Chenxu Hu
Hang Zhao
DiffM
13
79
0
29 Jun 2023
Text-Driven Foley Sound Generation With Latent Diffusion Model
Text-Driven Foley Sound Generation With Latent Diffusion Model
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Peipei Wu
Mark D.Plumbley
Wenwu Wang
DiffM
33
10
0
17 Jun 2023
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained
  Language-Vision Models
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Hao-Wen Dong
Xiaoyu Liu
Jordi Pons
Gautam Bhattacharya
Santiago Pascual
Joan Serra
Taylor Berg-Kirkpatrick
Julian McAuley
DiffM
13
19
0
16 Jun 2023
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real
  Objects
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects
Ruohan Gao
Yiming Dou
Hao Li
Tanmay Agarwal
Jeannette Bohg
Yunzhu Li
Li Fei-Fei
Jiajun Wu
11
29
0
01 Jun 2023
AudioToken: Adaptation of Text-Conditioned Diffusion Models for
  Audio-to-Image Generation
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Guy Yariv
Itai Gat
Lior Wolf
Yossi Adi
Idan Schwartz
DiffM
20
20
0
22 May 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio
  Codec
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
138
118
0
04 May 2023
Diverse and Vivid Sound Generation from Text Descriptions
Diverse and Vivid Sound Generation from Text Descriptions
Guangwei Li
Xuenan Xu
Lingfeng Dai
Mengyue Wu
K. Yu
45
4
0
03 May 2023
Conditional Generation of Audio from Video via Foley Analogies
Conditional Generation of Audio from Video via Foley Analogies
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
17
37
0
17 Apr 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffM
VGen
25
35
0
30 Mar 2023
Sounding Video Generator: A Unified Framework for Text-guided Sounding
  Video Generation
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Jiawei Liu
Weining Wang
Sihan Chen
Xinxin Zhu
J. Liu
DiffM
VGen
15
13
0
29 Mar 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and
  Enhancement in Generative AI
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffM
MedIm
39
64
0
23 Mar 2023
Sound Localization from Motion: Jointly Learning Sound Direction and
  Camera Rotation
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen
Shengyi Qian
Andrew Owens
16
12
0
20 Mar 2023
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A
  Benchmark Study
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study
Yiitan Yuan
Haohe Liu
Jinhua Liang
Xubo Liu
Mark D. Plumbley
Wenwu Wang
11
0
0
07 Mar 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
H. Meng
DiffM
VLM
31
84
0
31 Jan 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
140
315
0
30 Jan 2023
VarietySound: Timbre-Controllable Video to Sound Generation via
  Unsupervised Information Disentanglement
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
30
14
0
19 Nov 2022
I Hear Your True Colors: Image Guided Audio Generation
I Hear Your True Colors: Image Guided Audio Generation
Roy Sheffer
Yossi Adi
VLM
8
73
0
06 Nov 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
44
54
0
20 Aug 2022
A Proposal for Foley Sound Synthesis Challenge
A Proposal for Foley Sound Synthesis Challenge
Keunwoo Choi
Sangshin Oh
Minsung Kang
Brian McFee
11
11
0
21 Jul 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
25
295
0
20 Jul 2022
Learning Visual Styles from Audio-Visual Associations
Learning Visual Styles from Audio-Visual Associations
Tingle Li
Yichen Liu
Andrew Owens
Hang Zhao
DiffM
23
20
0
10 May 2022
Quantized GAN for Complex Music Generation from Dance Videos
Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
17
44
0
01 Apr 2022
FoleyGAN: Visually Guided Generative Adversarial Network-Based
  Synchronous Sound Generation in Silent Videos
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
Sanchita Ghose
John J. Prevost
GAN
11
26
0
20 Jul 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
242
484
0
20 Apr 2021
Spectrogram Inpainting for Interactive Generation of Instrument Sounds
Spectrogram Inpainting for Interactive Generation of Instrument Sounds
Théis Bazin
Gaëtan Hadjeres
P. Esling
M. Malt
21
11
0
15 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,764
0
24 Feb 2021
High Fidelity Speech Synthesis with Adversarial Networks
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
213
239
0
25 Sep 2019
Image-to-Image Translation with Conditional Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
212
19,387
0
21 Nov 2016
Conditional Image Synthesis With Auxiliary Classifier GANs
Conditional Image Synthesis With Auxiliary Classifier GANs
Augustus Odena
C. Olah
Jonathon Shlens
GAN
224
3,183
0
30 Oct 2016
Previous
12