ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.03424
  4. Cited By
Learning Audio-Visual Correlations from Variational Cross-Modal
  Generation
v1v2 (latest)

Learning Audio-Visual Correlations from Variational Cross-Modal Generation

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
5 February 2021
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
    SSL
ArXiv (abs)PDFHTML

Papers citing "Learning Audio-Visual Correlations from Variational Cross-Modal Generation"

32 / 32 papers shown
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation LearningACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
537
3
0
24 Nov 2024
Data Augmentation with GAN increases the Performance of Arrhythmia
  Classification for an Unbalanced Dataset
Data Augmentation with GAN increases the Performance of Arrhythmia Classification for an Unbalanced Dataset
Okan Düzyel
M. Kuntalp
320
7
0
24 Feb 2023
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal
  Retrieval
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal RetrievalIEEE International Symposium on Multimedia (ISM), 2022
Donghuo Zeng
Yanan Wang
Jianming Wu
K. Ikeda
252
7
0
07 Nov 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
487
42
0
05 Oct 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
264
39
0
20 Jun 2022
Discrete Contrastive Diffusion for Cross-Modal Music and Image
  Generation
Discrete Contrastive Diffusion for Cross-Modal Music and Image GenerationInternational Conference on Learning Representations (ICLR), 2022
Ye Zhu
Yuehua Wu
Kyle Olszewski
Jian Ren
Sergey Tulyakov
Yan Yan
DiffM
417
61
0
15 Jun 2022
Quantized GAN for Complex Music Generation from Dance Videos
Quantized GAN for Complex Music Generation from Dance VideosEuropean Conference on Computer Vision (ECCV), 2022
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
277
61
0
01 Apr 2022
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luís Vilacca
Yi Yu
Paula Viana
340
11
0
28 Feb 2022
Saying the Unseen: Video Descriptions via Dialog Agents
Saying the Unseen: Video Descriptions via Dialog AgentsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
252
8
0
26 Jun 2021
Cross-Modal Discrete Representation Learning
Cross-Modal Discrete Representation LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2021
Alexander H. Liu
SouYoung Jin
Cheng-I Jeff Lai
Andrew Rouditchenko
A. Oliva
James R. Glass
SSL
173
55
0
10 Jun 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural AudiosComputer Vision and Pattern Recognition (CVPR), 2021
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
230
74
0
13 Apr 2021
Foley Music: Learning to Generate Music from Videos
Foley Music: Learning to Generate Music from Videos
Chuang Gan
Deng Huang
Peihao Chen
J. Tenenbaum
Antonio Torralba
VGen
201
156
0
21 Jul 2020
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound SeparationComputer Vision and Pattern Recognition (CVPR), 2020
Chuang Gan
Deng Huang
Hang Zhao
J. Tenenbaum
Antonio Torralba
312
216
0
20 Apr 2020
Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality
Towards Generating Ambisonics Using Audio-Visual Cue for Virtual RealityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
A. Rana
C. Ozcinar
A. Smolic
201
31
0
16 Aug 2019
Self-Supervised Audio-Visual Co-Segmentation
Self-Supervised Audio-Visual Co-Segmentation
Andrew Rouditchenko
Hang Zhao
Chuang Gan
Josh H. McDermott
Antonio Torralba
VLMSSL
173
107
0
18 Apr 2019
Latent Translation: Crossing Modalities by Bridging Generative Models
Latent Translation: Crossing Modalities by Bridging Generative Models
Yingtao Tian
Jesse Engel
DRL
210
18
0
21 Feb 2019
Dual-modality seq2seq network for audio-visual event localization
Dual-modality seq2seq network for audio-visual event localization
Yan-Bo Lin
Yu-Jhe Li
Y. Wang
260
156
0
20 Feb 2019
Latent Alignment and Variational Attention
Latent Alignment and Variational Attention
Yuntian Deng
Yoon Kim
Justin T. Chiu
Demi Guo
Alexander M. Rush
BDL
240
118
0
10 Jul 2018
Cooperative Learning of Audio and Video Models from Self-Supervised
  Synchronization
Cooperative Learning of Audio and Video Models from Self-Supervised SynchronizationNeural Information Processing Systems (NeurIPS), 2018
Bruno Korbar
Du Tran
Lorenzo Torresani
501
509
0
30 Jun 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
780
806
0
10 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
585
587
0
09 Apr 2018
Learning to Separate Object Sounds by Watching Unlabeled Video
Learning to Separate Object Sounds by Watching Unlabeled Video
Ruohan Gao
Rogerio Feris
Kristen Grauman
SSL
366
297
0
05 Apr 2018
Cross-modal Deep Variational Hand Pose Estimation
Cross-modal Deep Variational Hand Pose Estimation
Adrian Spurr
Mingli Song
Seonwook Park
Otmar Hilliges
3DH
293
305
0
30 Mar 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
572
565
0
23 Mar 2018
Learning to Localize Sound Source in Visual Scenes
Learning to Localize Sound Source in Visual Scenes
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
381
375
0
10 Mar 2018
Degeneration in VAE: in the Light of Fisher Information Loss
Degeneration in VAE: in the Light of Fisher Information Loss
Huangjie Zheng
Jiangchao Yao
Ya Zhang
Ivor W. Tsang
DRL
211
18
0
19 Feb 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjDVOS
460
561
0
18 Dec 2017
Wasserstein Auto-Encoders
Wasserstein Auto-Encoders
Ilya O. Tolstikhin
Olivier Bousquet
Sylvain Gelly
B. Schölkopf
DRL
831
1,138
0
05 Nov 2017
Look, Listen and Learn
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
552
1,015
0
23 May 2017
Deep Cross-Modal Audio-Visual Generation
Deep Cross-Modal Audio-Visual Generation
Lele Chen
Sudhanshu Srivastava
Z. Duan
Chenliang Xu
386
233
0
26 Apr 2017
SoundNet: Learning Sound Representations from Unlabeled Video
SoundNet: Learning Sound Representations from Unlabeled Video
Y. Aytar
Carl Vondrick
Antonio Torralba
SSL
436
1,097
0
27 Oct 2016
Auto-Encoding Variational Bayes
Auto-Encoding Variational BayesInternational Conference on Learning Representations (ICLR), 2013
Diederik P. Kingma
Max Welling
BDL
1.7K
17,040
0
20 Dec 2013
1
Page 1 of 1