ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.17490
  4. Cited By
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

30 March 2023
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
    DiffM
    VGen
ArXivPDFHTML

Papers citing "Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment"

12 / 12 papers shown
Title
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Minjae Kang
Martim Brandão
56
0
0
25 Apr 2025
Read, Watch and Scream! Sound Generation from Text and Video
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
25
11
0
08 Jul 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
48
9
0
20 May 2024
Sound Source Localization is All about Cross-Modal Alignment
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
19
18
0
19 Sep 2023
UniBriVL: Robust Universal Representation and Generation of Audio Driven
  Diffusion Models
UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models
Sen Fang
Bowen Gao
Yangjian Wu
T. Teoh
DiffM
18
1
0
29 Jul 2023
Exploring Efficient-Tuned Learning Audio Representation Method from
  BriVL
Exploring Efficient-Tuned Learning Audio Representation Method from BriVL
Sen Fang
Yang Wu
Bowen Gao
Jingwen Cai
T. Teoh
DiffM
16
1
0
08 Mar 2023
Instance-Conditioned GAN
Instance-Conditioned GAN
Arantxa Casanova
Marlene Careil
Jakob Verbeek
M. Drozdzal
Adriana Romero Soriano
GAN
199
132
0
10 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
240
577
0
22 Apr 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
78
80
0
22 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,764
0
24 Feb 2021
Sound2Sight: Generating Visual Dynamics from Sound and Context
Sound2Sight: Generating Visual Dynamics from Sound and Context
A. Cherian
Moitreya Chatterjee
N. Ahuja
VGen
69
35
0
23 Jul 2020
Image Generation from Scene Graphs
Image Generation from Scene Graphs
Justin Johnson
Agrim Gupta
Li Fei-Fei
GNN
221
812
0
04 Apr 2018
1