v1v2 (latest)

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

50 / 491 papers shown

Visually Informed Binaural Audio Generation without Binaural AudiosComputer Vision and Pattern Recognition (CVPR), 2021

145

13 Apr 2021

Object Priors for Classifying and Localizing Unseen ActionsInternational Journal of Computer Vision (IJCV), 2021

Pascal Mettes

William Thong

Cees G. M. Snoek

245

10 Apr 2021

Towards Fine-grained Visual Representations by Combining Contrastive Learning with Image Reconstruction and Attention-weighted Pooling

Jonas Dippel

Steffen Vogler

Johannes Höhne

198

09 Apr 2021

MPN: Multimodal Parallel Network for Audio-Visual Event LocalizationIEEE International Conference on Multimedia and Expo (ICME), 2021

Jiashuo Yu

Ying Cheng

Rui Feng

214

07 Apr 2021

Contrastive Learning of Global-Local Video Representations

188

07 Apr 2021

Localizing Visual Sounds the Hard WayComputer Vision and Pattern Recognition (CVPR), 2021

Honglie Chen

Weidi Xie

Triantafyllos Afouras

Arsha Nagrani

Andrea Vedaldi

Andrew Zisserman

ObjD

210

225

06 Apr 2021

Cyclic Co-Learning of Sounding Object Visual Grounding and Sound SeparationComputer Vision and Pattern Recognition (CVPR), 2021

194

05 Apr 2021

Can audio-visual integration strengthen robustness under multimodal attacks?Computer Vision and Pattern Recognition (CVPR), 2021

Yapeng Tian

Chenliang Xu

AAML

304

05 Apr 2021

Cross-Modal learning for Audio-Visual Video ParsingInterspeech (Interspeech), 2021

Ganesh Ramakrishnan

240

03 Apr 2021

Touch-based Curiosity for Sparse-Reward Tasks

David Vazquez

139

01 Apr 2021

Unsupervised Sound Localization via Iterative Contrastive LearningComputer Vision and Image Understanding (CVIU), 2021

Yan-Bo Lin

Hung-Yu Tseng

Hsin-Ying Lee

Yen-Yu Lin

Ming-Hsuan Yang

SSL

184

01 Apr 2021

Collaborative Learning to Generate Audio-Video JointlyIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

154

01 Apr 2021

Broaden Your Views for Self-Supervised Video LearningIEEE International Conference on Computer Vision (ICCV), 2021

Adrià Recasens

Pauline Luc

Jean-Baptiste Alayrac

...

296

138

30 Mar 2021

Robust Audio-Visual Instance DiscriminationComputer Vision and Pattern Recognition (CVPR), 2021

246

117

29 Mar 2021

Discriminative Semantic Transitive Consistency for Cross-Modal LearningComputer Vision and Image Understanding (CVIU), 2021

Kranti K. Parida

Gaurav Sharma

205

25 Mar 2021

Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech SeparationComputer Vision and Pattern Recognition (CVPR), 2021

170

25 Mar 2021

Weakly-supervised Audio-visual Sound Source Detection and SeparationIEEE International Conference on Multimedia and Expo (ICME), 2021

Tanzila Rahman

Leonid Sigal

109

25 Mar 2021

Space-Time Crop & Attend: Improving Cross-modal Video Representation LearningIEEE International Conference on Computer Vision (ICCV), 2021

Joao Henriques

Andrea Vedaldi

AI4TS

271

18 Mar 2021

Beyond Image to Depth: Improving Depth Prediction using EchoesComputer Vision and Pattern Recognition (CVPR), 2021

288

15 Mar 2021

Multi-Format Contrastive Learning of Audio Representations

Luyu Wang

Aaron van den Oord

172

11 Mar 2021

Audio-Visual Speech Separation Using Cross-Modal Correspondence LossIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

191

02 Mar 2021

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal KnowledgeComputer Vision and Pattern Recognition (CVPR), 2021

Francisco Rivera Valverde

Juana Valeria Hurtado

Abhinav Valada

223

01 Mar 2021

Audiovisual Highlight Detection in VideosIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

108

11 Feb 2021

Template-Free Try-on Image Synthesis via Semantic-guided OptimizationIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021

113

06 Feb 2021

Learning Audio-Visual Correlations from Variational Cross-Modal GenerationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Ye Zhu

Yu Wu

Hugo Latapie

Yi Yang

Yan Yan

SSL

266

05 Feb 2021

Collaboration among Image and Object Level Features for Image Colourisation

Rita Pucci

C. Micheloni

N. Martinel

126

19 Jan 2021

MAAS: Multi-modal Assignation for Active Speaker DetectionIEEE International Conference on Computer Vision (ICCV), 2021

Juan Carlos León Alcázar

Fabian Caba Heilbron

Ali K. Thabet

Guohao Li

342

11 Jan 2021

VisualVoice: Audio-Visual Speech Separation with Cross-Modal ConsistencyComputer Vision and Pattern Recognition (CVPR), 2021

Ruohan Gao

Kristen Grauman

CVBM

450

239

08 Jan 2021

Human Action Recognition from Various Data Modalities: A ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

Zehua Sun

Jun Liu

584

699

22 Dec 2020

Semantic Audio-Visual NavigationComputer Vision and Pattern Recognition (CVPR), 2020

Changan Chen

Ziad Al-Halah

Kristen Grauman

292

117

21 Dec 2020

Visual Speech Enhancement Without A Real Visual StreamIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020

Sindhu B. Hegde

Prajwal K R

Rudrabha Mukhopadhyay

Vinay P. Namboodiri

C. V. Jawahar

DiffM

135

20 Dec 2020

ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency PredictionIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020

Subramanian Ramanathan

Vineet Gandhi

ViT

308

11 Dec 2020

Parameter Efficient Multimodal Transformers for Video Representation Learning

275

08 Dec 2020

Rethinking movie genre classification with fine-grained semantic clustering

167

04 Dec 2020

Multi-modal Fusion for Single-Stage Continuous Gesture Recognition

335

10 Nov 2020

Multi-Modal Learning of Keypoint Predictive Models for Visual Object Manipulation

Sarah Bechtle

Neha Das

Franziska Meier

SSL

159

08 Nov 2020

Learning Representations from Audio-Visual Spatial Alignment

178

138

03 Nov 2020

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

...

Sabato Marco Siniscalchi

Yannan Wang

Jun Du

Chin-Hui Lee

133

03 Nov 2020

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen SoundsInternational Conference on Learning Representations (ICLR), 2020

352

02 Nov 2020

Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning

251

29 Oct 2020

Remixing Music with Visual ConditioningIEEE International Symposium on Multimedia (ISM), 2020

Li-Chia Yang

Alexander Lerch

112

27 Oct 2020

Listening to Sounds of Silence for Speech Denoising

Carl Vondrick

203

22 Oct 2020

Contrastive Learning of General-Purpose Audio Representations

253

311

21 Oct 2020

LT-GAN: Self-Supervised GAN with Latent Transformation DetectionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020

151

19 Oct 2020

i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Chun-Liang Li

207

17 Oct 2020

Muse: Multi-modal target speaker extraction with visual cues

Zexu Pan

Ruijie Tao

Chenglin Xu

Haizhou Li

310

15 Oct 2020

^2

L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition

Lilang Lin

219

231

12 Oct 2020

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Xiao Tan

Errui Ding

200

149

12 Oct 2020

SEMI: Self-supervised Exploration via Multisensory IncongruityIEEE International Conference on Robotics and Automation (ICRA), 2020

Jianren Wang

Ziwen Zhuang

Hang Zhao

SSL

167

26 Sep 2020

Active Contrastive Learning of Audio-Visual Video Representations

168

31 Aug 2020