v1v2 (latest)

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

50 / 492 papers shown

How to Listen? Rethinking Visual Sound LocalizationInterspeech (Interspeech), 2022

150

11 Apr 2022

Probabilistic Representations for Video Contrastive LearningComputer Vision and Pattern Recognition (CVPR), 2022

314

08 Apr 2022

ECLIPSE: Efficient Long-range Video Retrieval using Sight and SoundEuropean Conference on Computer Vision (ECCV), 2022

Yan-Bo Lin

Jie Lei

Joey Tianyi Zhou

Gedas Bertasius

394

06 Apr 2022

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferComputer Vision and Pattern Recognition (CVPR), 2022

Li Fei-Fei

Jiajun Wu

178

106

05 Apr 2022

VocaLiST: An Audio-Visual Synchronisation Model for Lips and VoicesInterspeech (Interspeech), 2022

V. S. Kadandale

Juan F. Montesinos

G. Haro

230

05 Apr 2022

MultiMAE: Multi-modal Multi-task Masked AutoencodersEuropean Conference on Computer Vision (ECCV), 2022

427

349

04 Apr 2022

Quantized GAN for Complex Music Generation from Dance VideosEuropean Conference on Computer Vision (ECCV), 2022

Yan Yan

233

01 Apr 2022

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-SynthesisComputer Vision and Pattern Recognition (CVPR), 2022

175

31 Mar 2022

Speaker Extraction with Co-Speech Gestures CueIEEE Signal Processing Letters (SPL), 2022

Zexu Pan

Xinyuan Qian

Haizhou Li

SLR

180

31 Mar 2022

The Sound of Bounding-BoxesInternational Conference on Pattern Recognition (ICPR), 2022

Takashi Oya

Shohei Iwase

Shigeo Morishima

125

30 Mar 2022

Using Active Speaker Faces for Diarization in TV shows

Rahul Sharma

Shrikanth Narayanan

CVBM

186

30 Mar 2022

Balanced Multimodal Learning via On-the-fly Gradient ModulationComputer Vision and Pattern Recognition (CVPR), 2022

317

343

29 Mar 2022

Single-Stream Multi-Level Alignment for Vision-Language PretrainingEuropean Conference on Computer Vision (ECCV), 2022

356

27 Mar 2022

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

182

25 Mar 2022

The Challenges of Continuous Self-Supervised LearningEuropean Conference on Computer Vision (ECCV), 2022

240

23 Mar 2022

Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal DistillationEuropean Conference on Computer Vision (ECCV), 2022

280

21 Mar 2022

Localizing Visual Sounds the Easy WayEuropean Conference on Computer Vision (ECCV), 2022

Shentong Mo

Pedro Morgado

307

17 Mar 2022

Object discovery and representation networksEuropean Conference on Computer Vision (ECCV), 2022

425

16 Mar 2022

Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and LanguageComputer Vision and Pattern Recognition (CVPR), 2022

Otniel-Bogdan Mercea

Lukas Riesch

A. Sophia Koepke

Zeynep Akata

182

07 Mar 2022

Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated VideosComputer Vision and Pattern Recognition (CVPR), 2022

294

06 Mar 2022

$Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement$

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech EnhancementIEEE transactions on multimedia (IEEE TMM), 2022

Jun Xiong

Can Ma

Peng Zhang

Lei Xie

Wei Huang

Yufei Zha

199

04 Mar 2022

Audio Self-supervised Learning: A SurveyPatterns (Patterns), 2022

Shuo Liu

Adria Mallol-Ragolta

Emilia Parada-Cabeleiro

Kun Qian

Bjoern W. Schuller

241

130

02 Mar 2022

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud UnderstandingComputer Vision and Pattern Recognition (CVPR), 2022

Kanchana Thilakarathna

Ranga Rodrigo

3DPC

335

319

01 Mar 2022

COMPASS: Contrastive Multimodal Pretraining for Autonomous SystemsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022

191

20 Feb 2022

Learning Contextually Fused Audio-visual Representations for Audio-visual Speech RecognitionInternational Conference on Information Photonics (ICIP), 2022

274

15 Feb 2022

Visual Acoustic MatchingComputer Vision and Pattern Recognition (CVPR), 2022

302

14 Feb 2022

Visual Sound Localization in the Wild by Cross-Modal Interference ErasingAAAI Conference on Artificial Intelligence (AAAI), 2022

Ziwei Liu

184

13 Feb 2022

Audio-Visual Fusion Layers for Event Type Aware Video Recognition

In So Kweon

148

12 Feb 2022

Learning Sound Localization Better From Semantically Similar SamplesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

In So Kweon

174

07 Feb 2022

Active Audio-Visual Separation of Dynamic Sound SourcesEuropean Conference on Computer Vision (ECCV), 2022

Sagnik Majumder

Kristen Grauman

319

02 Feb 2022

New Insights on Target Speaker Extraction

Mohamed Elminshawi

Wolfgang Mack

Srikanth Raj Chetupalli

Soumitro Chakrabarty

Emanuel Habets

265

01 Feb 2022

Self-Supervised Moving Vehicle Detection from Audio-Visual CuesIEEE Robotics and Automation Letters (RA-L), 2022

Jannik Zürn

Wolfram Burgard

SSL

278

30 Jan 2022

Omnivore: A Single Model for Many Visual ModalitiesComputer Vision and Pattern Recognition (CVPR), 2022

Rohit Girdhar

Mannat Singh

Nikhil Ravi

Laurens van der Maaten

Armand Joulin

Ishan Misra

610

287

20 Jan 2022

Egocentric Deep Multi-Channel Audio-Visual Active Speaker LocalizationComputer Vision and Pattern Recognition (CVPR), 2022

239

06 Jan 2022

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionInternational Conference on Learning Representations (ICLR), 2022

370

420

05 Jan 2022

Sound and Visual Representation Learning with Multiple Pretraining TasksComputer Vision and Pattern Recognition (CVPR), 2022

A. Vasudevan

Dengxin Dai

Luc Van Gool

SSL

220

04 Jan 2022

Bilingual Speech Recognition by Estimating Speaker Geometry from Video DataInternational Conference on Computer Analysis of Images and Patterns (CAIP), 2021

Sylvia Celedón-Pattichis

Carlos López Leiva

130

26 Dec 2021

Fine-grained Multi-Modal Self-Supervised LearningBritish Machine Vision Conference (BMVC), 2021

Duo Wang

S. Karout

SSL

117

22 Dec 2021

Class-aware Sounding Objects Localization via Audiovisual CorrespondenceIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

184

22 Dec 2021

Decompose the Sounds and Pixels, Recompose the EventsAAAI Conference on Artificial Intelligence (AAAI), 2021

135

21 Dec 2021

Denoised Labels for Financial Time-Series Data via Self-Supervised LearningInternational Conference on AI in Finance (ICAF), 2021

143

19 Dec 2021

Audio-Visual Synchronisation in the wild

Honglie Chen

Weidi Xie

Triantafyllos Afouras

Arsha Nagrani

Andrea Vedaldi

Andrew Zisserman

205

08 Dec 2021

Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning

Srijan Das

Michael S. Ryoo

SSL

292

07 Dec 2021

ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints

Srijan Das

Michael S. Ryoo

SSL

214

07 Dec 2021

Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning

Manlin Zhang

Jinpeng Wang

A. J. Ma

173

07 Dec 2021

PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound

315

01 Dec 2021

ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with GeneticsComputer Vision and Pattern Recognition (CVPR), 2021

606

26 Nov 2021

MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing

Rui Feng

215

24 Nov 2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021

Rishabh Garg

Ruohan Gao

Kristen Grauman

176

21 Nov 2021

Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021

202

15 Nov 2021