ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.03641
  4. Cited By
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
v1v2 (latest)

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018
Andrew Owens
Alexei A. Efros
    SSL
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

50 / 492 papers shown
How to Listen? Rethinking Visual Sound Localization
How to Listen? Rethinking Visual Sound LocalizationInterspeech (Interspeech), 2022
Ho-Hsiang Wu
Magdalena Fuentes
Prem Seetharaman
J. P. Bello
ObjD
150
5
0
11 Apr 2022
Probabilistic Representations for Video Contrastive Learning
Probabilistic Representations for Video Contrastive LearningComputer Vision and Pattern Recognition (CVPR), 2022
Jungin Park
Jiyoung Lee
Ig-Jae Kim
Kwanghoon Sohn
SSL
314
53
0
08 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and SoundEuropean Conference on Computer Vision (ECCV), 2022
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
394
53
0
06 Apr 2022
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferComputer Vision and Pattern Recognition (CVPR), 2022
Ruohan Gao
Zilin Si
Yen-Yu Chang
Samuel Clarke
Jeannette Bohg
Li Fei-Fei
Wenzhen Yuan
Jiajun Wu
178
106
0
05 Apr 2022
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
VocaLiST: An Audio-Visual Synchronisation Model for Lips and VoicesInterspeech (Interspeech), 2022
V. S. Kadandale
Juan F. Montesinos
G. Haro
230
30
0
05 Apr 2022
MultiMAE: Multi-modal Multi-task Masked Autoencoders
MultiMAE: Multi-modal Multi-task Masked AutoencodersEuropean Conference on Computer Vision (ECCV), 2022
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
427
349
0
04 Apr 2022
Quantized GAN for Complex Music Generation from Dance Videos
Quantized GAN for Complex Music Generation from Dance VideosEuropean Conference on Computer Vision (ECCV), 2022
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
233
57
0
01 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement
  by Re-Synthesis
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-SynthesisComputer Vision and Pattern Recognition (CVPR), 2022
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
175
45
0
31 Mar 2022
Speaker Extraction with Co-Speech Gestures Cue
Speaker Extraction with Co-Speech Gestures CueIEEE Signal Processing Letters (SPL), 2022
Zexu Pan
Xinyuan Qian
Haizhou Li
SLR
180
34
0
31 Mar 2022
The Sound of Bounding-Boxes
The Sound of Bounding-BoxesInternational Conference on Pattern Recognition (ICPR), 2022
Takashi Oya
Shohei Iwase
Shigeo Morishima
125
2
0
30 Mar 2022
Using Active Speaker Faces for Diarization in TV shows
Using Active Speaker Faces for Diarization in TV shows
Rahul Sharma
Shrikanth Narayanan
CVBM
186
10
0
30 Mar 2022
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Balanced Multimodal Learning via On-the-fly Gradient ModulationComputer Vision and Pattern Recognition (CVPR), 2022
Xiaokang Peng
Yake Wei
Andong Deng
Dong Wang
Di Hu
317
343
0
29 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language PretrainingEuropean Conference on Computer Vision (ECCV), 2022
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIPVLM
356
22
0
27 Mar 2022
Self-Supervised Predictive Learning: A Negative-Free Method for Sound
  Source Localization in Visual Scenes
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Zengjie Song
Yuxi Wang
Junsong Fan
Tieniu Tan
Zhaoxiang Zhang
SSL
182
47
0
25 Mar 2022
The Challenges of Continuous Self-Supervised Learning
The Challenges of Continuous Self-Supervised LearningEuropean Conference on Computer Vision (ECCV), 2022
Senthil Purushwalkam
Pedro Morgado
Abhinav Gupta
CLL
240
55
0
23 Mar 2022
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via
  Cross-modal Distillation
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal DistillationEuropean Conference on Computer Vision (ECCV), 2022
Antonín Vobecký
David Hurych
Oriane Siméoni
Spyros Gidaris
Andrei Bursuc
Patrick Pérez
Josef Sivic
3DPC
280
29
0
21 Mar 2022
Localizing Visual Sounds the Easy Way
Localizing Visual Sounds the Easy WayEuropean Conference on Computer Vision (ECCV), 2022
Shentong Mo
Pedro Morgado
307
99
0
17 Mar 2022
Object discovery and representation networks
Object discovery and representation networksEuropean Conference on Computer Vision (ECCV), 2022
Olivier J. Hénaff
Skanda Koppula
Evan Shelhamer
Daniel Zoran
Andrew Jaegle
Andrew Zisserman
João Carreira
Relja Arandjelović
425
95
0
16 Mar 2022
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention
  and Language
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and LanguageComputer Vision and Pattern Recognition (CVPR), 2022
Otniel-Bogdan Mercea
Lukas Riesch
A. Sophia Koepke
Zeynep Akata
182
56
0
07 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition
  on Modality-Specific Annotated Videos
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated VideosComputer Vision and Pattern Recognition (CVPR), 2022
Saghir Alfasly
Jian Lu
C. Xu
Yuru Zou
294
26
0
06 Mar 2022
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker
  Detection and Speech Enhancement
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech EnhancementIEEE transactions on multimedia (IEEE TMM), 2022
Jun Xiong
Can Ma
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
199
37
0
04 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A SurveyPatterns (Patterns), 2022
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
241
130
0
02 Mar 2022
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D
  Point Cloud Understanding
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud UnderstandingComputer Vision and Pattern Recognition (CVPR), 2022
Mohamed Afham
Isuru Dissanayake
Dinithi Dissanayake
Amaya Dharmasiri
Kanchana Thilakarathna
Ranga Rodrigo
3DPC
335
319
0
01 Mar 2022
COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems
COMPASS: Contrastive Multimodal Pretraining for Autonomous SystemsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Shuang Ma
Sai H. Vemprala
Wenshan Wang
Jayesh K. Gupta
Yale Song
Daniel J. McDuff
Ashish Kapoor
SSL
191
12
0
20 Feb 2022
Learning Contextually Fused Audio-visual Representations for
  Audio-visual Speech Recognition
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech RecognitionInternational Conference on Information Photonics (ICIP), 2022
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
274
12
0
15 Feb 2022
Visual Acoustic Matching
Visual Acoustic MatchingComputer Vision and Pattern Recognition (CVPR), 2022
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
302
65
0
14 Feb 2022
Visual Sound Localization in the Wild by Cross-Modal Interference
  Erasing
Visual Sound Localization in the Wild by Cross-Modal Interference ErasingAAAI Conference on Artificial Intelligence (AAAI), 2022
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
184
31
0
13 Feb 2022
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Arda Senocak
Junsik Kim
Tae-Hyun Oh
H. Ryu
Dingzeyu Li
In So Kweon
148
1
0
12 Feb 2022
Learning Sound Localization Better From Semantically Similar Samples
Learning Sound Localization Better From Semantically Similar SamplesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Arda Senocak
H. Ryu
Junsik Kim
In So Kweon
SSL
174
37
0
07 Feb 2022
Active Audio-Visual Separation of Dynamic Sound Sources
Active Audio-Visual Separation of Dynamic Sound SourcesEuropean Conference on Computer Vision (ECCV), 2022
Sagnik Majumder
Kristen Grauman
319
23
0
02 Feb 2022
New Insights on Target Speaker Extraction
New Insights on Target Speaker Extraction
Mohamed Elminshawi
Wolfgang Mack
Srikanth Raj Chetupalli
Soumitro Chakrabarty
Emanuel Habets
265
24
0
01 Feb 2022
Self-Supervised Moving Vehicle Detection from Audio-Visual Cues
Self-Supervised Moving Vehicle Detection from Audio-Visual CuesIEEE Robotics and Automation Letters (RA-L), 2022
Jannik Zürn
Wolfram Burgard
SSL
278
10
0
30 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual ModalitiesComputer Vision and Pattern Recognition (CVPR), 2022
Rohit Girdhar
Mannat Singh
Nikhil Ravi
Laurens van der Maaten
Armand Joulin
Ishan Misra
610
287
0
20 Jan 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Egocentric Deep Multi-Channel Audio-Visual Active Speaker LocalizationComputer Vision and Pattern Recognition (CVPR), 2022
Hao Jiang
Calvin Murdock
V. Ithapu
EgoV
239
47
0
06 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster
  Prediction
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionInternational Conference on Learning Representations (ICLR), 2022
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
370
420
0
05 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks
Sound and Visual Representation Learning with Multiple Pretraining TasksComputer Vision and Pattern Recognition (CVPR), 2022
A. Vasudevan
Dengxin Dai
Luc Van Gool
SSL
220
7
0
04 Jan 2022
Bilingual Speech Recognition by Estimating Speaker Geometry from Video
  Data
Bilingual Speech Recognition by Estimating Speaker Geometry from Video DataInternational Conference on Computer Analysis of Images and Patterns (CAIP), 2021
Luis Sanchez Tapia
Antonio Gomez
Mario Esparza
Venkatesh Jatla
Marios S. Pattichis
Sylvia Celedón-Pattichis
Carlos López Leiva
130
5
0
26 Dec 2021
Fine-grained Multi-Modal Self-Supervised Learning
Fine-grained Multi-Modal Self-Supervised LearningBritish Machine Vision Conference (BMVC), 2021
Duo Wang
S. Karout
SSL
117
7
0
22 Dec 2021
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Class-aware Sounding Objects Localization via Audiovisual CorrespondenceIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
184
47
0
22 Dec 2021
Decompose the Sounds and Pixels, Recompose the Events
Decompose the Sounds and Pixels, Recompose the EventsAAAI Conference on Artificial Intelligence (AAAI), 2021
Varshanth R. Rao
Md Ibrahim Khalil
Haoda Li
Peng Dai
Juwei Lu
135
5
0
21 Dec 2021
Denoised Labels for Financial Time-Series Data via Self-Supervised
  Learning
Denoised Labels for Financial Time-Series Data via Self-Supervised LearningInternational Conference on AI in Finance (ICAF), 2021
Yanqing Ma
Carmine Ventre
M. Polukarov
NoLa
143
10
0
19 Dec 2021
Audio-Visual Synchronisation in the wild
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
205
49
0
08 Dec 2021
Cross-modal Manifold Cutmix for Self-supervised Video Representation
  Learning
Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
Srijan Das
Michael S. Ryoo
SSL
292
1
0
07 Dec 2021
ViewCLR: Learning Self-supervised Video Representation for Unseen
  Viewpoints
ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints
Srijan Das
Michael S. Ryoo
SSL
214
30
0
07 Dec 2021
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised
  Video Representation Learning
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning
Manlin Zhang
Jinpeng Wang
A. J. Ma
173
9
0
07 Dec 2021
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
Zhijian Yang
Xiaoran Fan
Volkan Isler
H. Park
3DH
315
9
0
01 Dec 2021
ContIG: Self-supervised Multimodal Contrastive Learning for Medical
  Imaging with Genetics
ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with GeneticsComputer Vision and Pattern Recognition (CVPR), 2021
Aiham Taleb
Matthias Kirchler
Remo Monti
Christoph Lippert
SSLMedIm
606
69
0
26 Nov 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual
  Event Localization and Video Parsing
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
215
81
0
24 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021
Rishabh Garg
Ruohan Gao
Kristen Grauman
176
31
0
21 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with
  Depth and Cross Modal Attention
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal AttentionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
202
28
0
15 Nov 2021
Previous
123456...8910
Next
Page 5 of 10