ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.08292
  4. Cited By
Deep Cross-Modal Audio-Visual Generation

Deep Cross-Modal Audio-Visual Generation

26 April 2017
Lele Chen
Sudhanshu Srivastava
Z. Duan
Chenliang Xu
ArXivPDFHTML

Papers citing "Deep Cross-Modal Audio-Visual Generation"

45 / 45 papers shown
Title
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Seeing Soundscapes: Audio-Visual Generation and Separation from Soundscapes Using Audio-Visual Separator
Minjae Kang
Martim Brandão
64
0
0
25 Apr 2025
MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field
MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field
Zijian Győző Yang
Zhongwei Qiu
Chang Xu
Dongmei Fu
50
2
0
28 Jan 2025
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
90
3
0
23 Nov 2024
X-Drive: Cross-modality consistent multi-sensor data synthesis for
  driving scenarios
X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios
Yichen Xie
Chenfeng Xu
C-T.John Peng
Shuqi Zhao
Nhat Ho
Alexander T. Pham
Mingyu Ding
Masayoshi Tomizuka
Weidong Zhan
DiffM
41
2
0
02 Nov 2024
Read, Watch and Scream! Sound Generation from Text and Video
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
31
12
0
08 Jul 2024
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal
  Music Processing
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
Yu-Fen Huang
Nikki Moran
Simon Coleman
Jon Kelly
Shun-Hwa Wei
...
Chih-Hsuan Li
Da-Yu Huang
Hsuan-Kai Kao
Ting-Wei Lin
Li Su
41
1
0
10 Jun 2024
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal
  Retrieval
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal Retrieval
Donghuo Zeng
Yanan Wang
Jianming Wu
K. Ikeda
27
4
0
07 Nov 2022
Multimodal Transformer for Parallel Concatenated Variational
  Autoencoders
Multimodal Transformer for Parallel Concatenated Variational Autoencoders
Stephen D. Liang
J. Mendel
ViT
27
5
0
28 Oct 2022
Robust Sound-Guided Image Manipulation
Robust Sound-Guided Image Manipulation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
DiffM
26
7
0
30 Aug 2022
Auto-regressive Image Synthesis with Integrated Quantization
Auto-regressive Image Synthesis with Integrated Quantization
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Kai Cui
Changgong Zhang
Shijian Lu
38
10
0
21 Jul 2022
Cross-Modal Contrastive Representation Learning for Audio-to-Image
  Generation
Cross-Modal Contrastive Representation Learning for Audio-to-Image Generation
Haechun Chung
JooYong Shim
Jong-Kook Kim
27
3
0
20 Jul 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional
  Vision-Language Generation
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
27
58
0
31 Dec 2021
Multimodal Image Synthesis and Editing: The Generative AI Era
Multimodal Image Synthesis and Editing: The Generative AI Era
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Shijian Lu
Lingjie Liu
Adam Kortylewski
Christian Theobalt
Eric Xing
EGVM
29
48
0
27 Dec 2021
Automated Side Channel Analysis of Media Software with Manifold Learning
Automated Side Channel Analysis of Media Software with Manifold Learning
Yuanyuan Yuan
Qi Pang
Shuai Wang
AAML
40
18
0
09 Dec 2021
Sound-Guided Semantic Image Manipulation
Sound-Guided Semantic Image Manipulation
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
27
43
0
30 Nov 2021
Learning Signal-Agnostic Manifolds of Neural Fields
Learning Signal-Agnostic Manifolds of Neural Fields
Yilun Du
Katherine M. Collins
J. Tenenbaum
Vincent Sitzmann
MedIm
29
47
0
11 Nov 2021
Taming Visually Guided Sound Generation
Taming Visually Guided Sound Generation
Vladimir E. Iashin
Esa Rahtu
VLM
32
122
0
17 Oct 2021
Cross-Modal Virtual Sensing for Combustion Instability Monitoring
Cross-Modal Virtual Sensing for Combustion Instability Monitoring
Tryambak Gangopadhyay
V. Ramanan
S. Chakravarthy
S. Sarkar
21
1
0
04 Oct 2021
Audio-to-Image Cross-Modal Generation
Audio-to-Image Cross-Modal Generation
Maciej Żelaszczyk
Jacek Mańdziuk
DiffM
53
15
0
27 Sep 2021
Cross-modal Spectrum Transformation Network For Acoustic Scene
  classification
Cross-modal Spectrum Transformation Network For Acoustic Scene classification
Yang Liu
A. Neophytou
Sunando Sengupta
Eric Sommerlade
21
9
0
13 Aug 2021
FoleyGAN: Visually Guided Generative Adversarial Network-Based
  Synchronous Sound Generation in Silent Videos
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
Sanchita Ghose
John J. Prevost
GAN
27
26
0
20 Jul 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial
  Networks
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Rodrigo Mira
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Björn W. Schuller
M. Pantic
29
43
0
27 Apr 2021
Pose-Controllable Talking Face Generation by Implicitly Modularized
  Audio-Visual Representation
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
Hang Zhou
Yasheng Sun
Wayne Wu
Chen Change Loy
Xiaogang Wang
Ziwei Liu
CVBM
28
360
0
22 Apr 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
36
37
0
05 Apr 2021
Sim-to-Real for Robotic Tactile Sensing via Physics-Based Simulation and
  Learned Latent Projections
Sim-to-Real for Robotic Tactile Sensing via Physics-Based Simulation and Learned Latent Projections
Yashraj S. Narang
Balakumar Sundaralingam
Miles Macklin
Arsalan Mousavian
Dieter Fox
30
58
0
31 Mar 2021
Learning Audio-Visual Correlations from Variational Cross-Modal
  Generation
Learning Audio-Visual Correlations from Variational Cross-Modal Generation
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
SSL
35
20
0
05 Feb 2021
Sound Synthesis, Propagation, and Rendering: A Survey
Sound Synthesis, Propagation, and Rendering: A Survey
Shiguang Liu
Tianyi Zhou
27
26
0
11 Nov 2020
Video Generative Adversarial Networks: A Review
Video Generative Adversarial Networks: A Review
Nuha Aldausari
Arcot Sowmya
Nadine Marcus
Gelareh Mohammadi
EGVM
21
102
0
04 Nov 2020
Temporally Guided Music-to-Body-Movement Generation
Temporally Guided Music-to-Body-Movement Generation
Hsuan-Kai Kao
Li Su
44
42
0
17 Sep 2020
Generating Visually Aligned Sound from Videos
Generating Visually Aligned Sound from Videos
Peihao Chen
Yang Zhang
Mingkui Tan
Hongdong Xiao
Deng Huang
Chuang Gan
VGen
16
95
0
14 Jul 2020
A Systematic Survey on Deep Generative Models for Graph Generation
A Systematic Survey on Deep Generative Models for Graph Generation
Xiaojie Guo
Liang Zhao
MedIm
44
147
0
13 Jul 2020
Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and
  Fusion
Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion
Yang Wang
33
195
0
15 Jun 2020
Direct Speech-to-image Translation
Direct Speech-to-image Translation
Jiguo Li
Xinfeng Zhang
Chuanmin Jia
Jizheng Xu
Li Zhang
Y. Wang
Siwei Ma
Wen Gao
36
29
0
07 Apr 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
29
88
0
24 Oct 2019
Translating Visual Art into Music
Translating Visual Art into Music
Max Müller-Eberstein
Nanne van Noord
DRL
24
7
0
03 Sep 2019
Realistic Speech-Driven Facial Animation with GANs
Realistic Speech-Driven Facial Animation with GANs
Konstantinos Vougioukas
Stavros Petridis
M. Pantic
39
289
0
14 Jun 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
33
206
0
16 Apr 2019
2.5D Visual Sound
2.5D Visual Sound
Ruohan Gao
Kristen Grauman
VGen
11
130
0
11 Dec 2018
Talking Face Generation by Conditional Recurrent Adversarial Network
Talking Face Generation by Conditional Recurrent Adversarial Network
Yang Song
Jingwen Zhu
Dawei Li
Xiaolong Wang
Hairong Qi
CVBM
27
192
0
13 Apr 2018
Lip Movements Generation at a Glance
Lip Movements Generation at a Glance
Lele Chen
Zhiheng Li
R. Maddox
Z. Duan
Chenliang Xu
25
259
0
28 Mar 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
36
425
0
23 Mar 2018
Adversarial Audio Synthesis
Adversarial Audio Synthesis
Chris Donahue
Julian McAuley
M. Puckette
GAN
27
602
0
12 Feb 2018
Creating A Multi-track Classical Musical Performance Dataset for
  Multimodal Music Analysis: Challenges, Insights, and Applications
Creating A Multi-track Classical Musical Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications
Bochen Li
Xinzhao Liu
K. Dinesh
Z. Duan
Gaurav Sharma
23
148
0
27 Dec 2016
Learning Deep Representations of Fine-grained Visual Descriptions
Learning Deep Representations of Fine-grained Visual Descriptions
Scott E. Reed
Zeynep Akata
Bernt Schiele
Honglak Lee
OCL
VLM
170
840
0
17 May 2016
1