Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.08791
Cited By
Taming Visually Guided Sound Generation
17 October 2021
Vladimir E. Iashin
Esa Rahtu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Taming Visually Guided Sound Generation"
43 / 93 papers shown
Title
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
16
18
0
30 Apr 2024
Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Gehui Chen
Guan’an Wang
Xiaowen Huang
Jitao Sang
VGen
11
8
0
25 Apr 2024
TAVGBench: Benchmarking Text to Audible-Video Generation
Yuxin Mao
Xuyang Shen
Jing Zhang
Zhen Qin
Jinxing Zhou
Mochu Xiang
Yiran Zhong
Yuchao Dai
40
11
0
22 Apr 2024
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
37
17
0
08 Mar 2024
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing
Yin-Yin He
Zeyue Tian
Xintao Wang
Qifeng Chen
27
49
0
27 Feb 2024
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
Hila Manor
T. Michaeli
DiffM
21
25
0
15 Feb 2024
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
25
2
0
09 Jan 2024
Controllable Music Production with Diffusion Models and Guidance Gradients
Mark Levy
Bruno Di Giorgi
Floris Weers
Angelos Katharopoulos
Tom Nickson
DiffM
75
19
0
01 Nov 2023
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
Marco Comunità
R. F. Gramaccioni
Emilian Postolache
Emanuele Rodolà
Danilo Comminiello
Joshua D. Reiss
DiffM
22
16
0
23 Oct 2023
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
16
20
0
19 Sep 2023
Retrieval-Augmented Text-to-Audio Generation
Yiitan Yuan
Haohe Liu
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
RALM
12
24
0
14 Sep 2023
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Zhichao Wu
Qiulin Li
Sixing Liu
Qun Yang
17
3
0
13 Sep 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
16
37
0
18 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
17
220
0
10 Aug 2023
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Simian Luo
Chuanhao Yan
Chenxu Hu
Hang Zhao
DiffM
13
79
0
29 Jun 2023
Text-Driven Foley Sound Generation With Latent Diffusion Model
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Peipei Wu
Mark D.Plumbley
Wenwu Wang
DiffM
33
10
0
17 Jun 2023
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Hao-Wen Dong
Xiaoyu Liu
Jordi Pons
Gautam Bhattacharya
Santiago Pascual
Joan Serra
Taylor Berg-Kirkpatrick
Julian McAuley
DiffM
13
19
0
16 Jun 2023
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects
Ruohan Gao
Yiming Dou
Hao Li
Tanmay Agarwal
Jeannette Bohg
Yunzhu Li
Li Fei-Fei
Jiajun Wu
11
29
0
01 Jun 2023
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Guy Yariv
Itai Gat
Lior Wolf
Yossi Adi
Idan Schwartz
DiffM
20
20
0
22 May 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
138
118
0
04 May 2023
Diverse and Vivid Sound Generation from Text Descriptions
Guangwei Li
Xuenan Xu
Lingfeng Dai
Mengyue Wu
K. Yu
45
4
0
03 May 2023
Conditional Generation of Audio from Video via Foley Analogies
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
17
37
0
17 Apr 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffM
VGen
25
35
0
30 Mar 2023
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Jiawei Liu
Weining Wang
Sihan Chen
Xinxin Zhu
J. Liu
DiffM
VGen
15
13
0
29 Mar 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffM
MedIm
39
64
0
23 Mar 2023
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen
Shengyi Qian
Andrew Owens
16
12
0
20 Mar 2023
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study
Yiitan Yuan
Haohe Liu
Jinhua Liang
Xubo Liu
Mark D. Plumbley
Wenwu Wang
11
0
0
07 Mar 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
H. Meng
DiffM
VLM
31
84
0
31 Jan 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
140
315
0
30 Jan 2023
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
30
14
0
19 Nov 2022
I Hear Your True Colors: Image Guided Audio Generation
Roy Sheffer
Yossi Adi
VLM
8
73
0
06 Nov 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
44
54
0
20 Aug 2022
A Proposal for Foley Sound Synthesis Challenge
Keunwoo Choi
Sangshin Oh
Minsung Kang
Brian McFee
11
11
0
21 Jul 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
25
295
0
20 Jul 2022
Learning Visual Styles from Audio-Visual Associations
Tingle Li
Yichen Liu
Andrew Owens
Hang Zhao
DiffM
23
20
0
10 May 2022
Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
17
44
0
01 Apr 2022
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
Sanchita Ghose
John J. Prevost
GAN
11
26
0
20 Jul 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViT
VGen
242
484
0
20 Apr 2021
Spectrogram Inpainting for Interactive Generation of Instrument Sounds
Théis Bazin
Gaëtan Hadjeres
P. Esling
M. Malt
21
11
0
15 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,764
0
24 Feb 2021
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
213
239
0
25 Sep 2019
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
212
19,387
0
21 Nov 2016
Conditional Image Synthesis With Auxiliary Classifier GANs
Augustus Odena
C. Olah
Jonathon Shlens
GAN
224
3,183
0
30 Oct 2016
Previous
1
2