ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.03641
  4. Cited By
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
v1v2 (latest)

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

10 April 2018
Andrew Owens
Alexei A. Efros
    SSL
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"

50 / 491 papers shown
Title
PMR: Prototypical Modal Rebalance for Multimodal Learning
PMR: Prototypical Modal Rebalance for Multimodal LearningComputer Vision and Pattern Recognition (CVPR), 2022
Yunfeng Fan
Wenchao Xu
Yining Qi
Junxiao Wang
Song Guo
1.5K
143
0
14 Nov 2022
SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio
  Detection
SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio DetectionPattern Recognition (Pattern Recogn.), 2022
Jiangyan Yi
Chenglong Wang
Jianhua Tao
Chu Yuan Zhang
Cunhang Fan
Zhengkun Tian
Haoxin Ma
Ruibo Fu
192
26
0
11 Nov 2022
Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source
  Localization
Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source LocalizationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Dennis Fedorishin
D. Mohan
Bhavin Jawade
S. Setlur
V. Govindaraju
VGen
163
14
0
06 Nov 2022
MarginNCE: Robust Sound Localization with a Negative Margin
MarginNCE: Robust Sound Localization with a Negative MarginIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sooyoung Park
Arda Senocak
Joon Son Chung
SSL
116
16
0
03 Nov 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source
  Separation
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source SeparationNeural Information Processing Systems (NeurIPS), 2022
Moitreya Chatterjee
Narendra Ahuja
A. Cherian
180
14
0
29 Oct 2022
Multimodal Transformer Distillation for Audio-Visual Synchronization
Multimodal Transformer Distillation for Audio-Visual SynchronizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xuan-Bo Chen
Haibin Wu
Chung-Che Wang
Hung-yi Lee
J. Jang
132
6
0
27 Oct 2022
Anticipative Feature Fusion Transformer for Multi-Modal Action
  Anticipation
Anticipative Feature Fusion Transformer for Multi-Modal Action AnticipationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Zeyun Zhong
David Schneider
Michael Voit
Rainer Stiefelhagen
Jürgen Beyerer
157
59
0
23 Oct 2022
Multimodal Neural Network For Demand Forecasting
Multimodal Neural Network For Demand ForecastingInternational Conference on Neural Information Processing (ICONIP), 2022
Nitesh Kumar
K. Dheenadayalan
Suprabath Reddy
Sumant Kulkarni
AI4TS
110
8
0
20 Oct 2022
Sparse in Space and Time: Audio-visual Synchronisation with Trainable
  Selectors
Sparse in Space and Time: Audio-visual Synchronisation with Trainable SelectorsBritish Machine Vision Conference (BMVC), 2022
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
138
31
0
13 Oct 2022
Leveraging the Video-level Semantic Consistency of Event for
  Audio-visual Event Localization
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event LocalizationIEEE transactions on multimedia (IEEE TMM), 2022
Yuanyuan Jiang
Jianqin Yin
Yonghao Dang
106
14
0
11 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked AutoencoderInternational Conference on Learning Representations (ICLR), 2022
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
308
165
0
02 Oct 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language TransformerNeural Information Processing Systems (NeurIPS), 2022
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
304
36
0
28 Sep 2022
Learning State-Aware Visual Representations from Audible Interactions
Learning State-Aware Visual Representations from Audible InteractionsNeural Information Processing Systems (NeurIPS), 2022
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
186
28
0
27 Sep 2022
Unsupervised active speaker detection in media content using cross-modal
  information
Unsupervised active speaker detection in media content using cross-modal information
Rahul Sharma
Shrikanth Narayanan
201
3
0
24 Sep 2022
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
A Closer Look at Weakly-Supervised Audio-Visual Source LocalizationNeural Information Processing Systems (NeurIPS), 2022
Shentong Mo
Pedro Morgado
229
79
0
30 Aug 2022
Semi-Supervised Disentanglement of Tactile Contact~Geometry from
  Sliding-Induced Shear
Semi-Supervised Disentanglement of Tactile Contact~Geometry from Sliding-Induced ShearIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
A. Gupta
Alex Church
Nathan Lepora
222
2
0
26 Aug 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
Semi-Supervised and Unsupervised Deep Visual Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yanbei Chen
Goran Frehse
Xiatian Zhu
Zeynep Akata
291
160
0
24 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
252
66
0
20 Aug 2022
Impact Makes a Sound and Sound Makes an Impact: Sound Guides
  Representations and Explorations
Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and ExplorationsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Xufeng Zhao
C. Weber
Muhammad Burhan Hafez
S. Wermter
140
10
0
04 Aug 2022
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose
  Regression and Odometry-aided Absolute Pose Regression
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression
Felix Ott
Nisha Lakshmana Raichur
David Rügamer
Tobias Feigl
Heiko Neumann
B. Bischl
Christopher Mutschler
308
3
0
01 Aug 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated
  Open-Domain On-Screen Sound Separation
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound SeparationEuropean Conference on Computer Vision (ECCV), 2022
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
271
33
0
20 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Is an Object-Centric Video Representation Beneficial for Transfer?Asian Conference on Computer Vision (ACCV), 2022
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
318
30
0
20 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Temporal and cross-modal attention for audio-visual zero-shot learningEuropean Conference on Computer Vision (ECCV), 2022
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
183
32
0
20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos
SVGraph: Learning Semantic Graphs from Instructional VideosIEEE International Conference on Multimedia Big Data (ICMBD), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
197
5
0
16 Jul 2022
Modality-Aware Contrastive Instance Learning with Self-Distillation for
  Weakly-Supervised Audio-Visual Violence Detection
Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence DetectionACM Multimedia (ACM MM), 2022
Jiashuo Yu
Jin-Yuan Liu
Ying Cheng
Rui Feng
Yuejie Zhang
209
49
0
12 Jul 2022
Audio-Visual Segmentation
Audio-Visual SegmentationEuropean Conference on Computer Vision (ECCV), 2022
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
251
161
0
11 Jul 2022
Towards Proper Contrastive Self-supervised Learning Strategies For Music
  Audio Representation
Towards Proper Contrastive Self-supervised Learning Strategies For Music Audio RepresentationIEEE International Conference on Multimedia and Expo (ICME), 2022
Jeong-Eun Choi
Seongwon Jang
Hyunsouk Cho
Sehee Chung
SSL
143
11
0
10 Jul 2022
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual IntegrationComputer Vision and Pattern Recognition (CVPR), 2022
Chuang Gan
Yi Gu
Siyuan Zhou
Jeremy Schwartz
S. Alter
James Traer
Dan Gutfreund
J. Tenenbaum
Josh H. McDermott
Antonio Torralba
259
20
0
07 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm
  Synchronization
Learning Music-Dance Representations through Explicit-Implicit Rhythm SynchronizationIEEE transactions on multimedia (IEEE TMM), 2022
Jiashuo Yu
Junfu Pu
Ying Cheng
Rui Feng
Ying Shan
230
7
0
07 Jul 2022
Beyond Visual Field of View: Perceiving 3D Environment with Echoes and
  Vision
Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision
Xiangjie Sui
Esa Rahtu
Hang Zhao
MDE
270
7
0
03 Jul 2022
Exploiting Transformation Invariance and Equivariance for
  Self-supervised Sound Localisation
Exploiting Transformation Invariance and Equivariance for Self-supervised Sound LocalisationACM Multimedia (ACM MM), 2022
Jinxian Liu
Chen Ju
Weidi Xie
Ya Zhang
212
47
0
26 Jun 2022
Rethinking Audio-visual Synchronization for Active Speaker Detection
Rethinking Audio-visual Synchronization for Active Speaker DetectionInternational Workshop on Machine Learning for Signal Processing (MLSP), 2022
Abudukelimu Wuerkaixi
You Zhang
Z. Duan
Changshui Zhang
156
18
0
21 Jun 2022
Probing Visual-Audio Representation for Video Highlight Detection via
  Hard-Pairs Guided Contrastive Learning
Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive LearningBritish Machine Vision Conference (BMVC), 2022
Shuaicheng Li
Feng Zhang
Kunlin Yang
Lin-Na Liu
Shinan Liu
Jun Hou
Shuai Yi
177
10
0
21 Jun 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
204
34
0
20 Jun 2022
GaLeNet: Multimodal Learning for Disaster Prediction, Management and
  Relief
GaLeNet: Multimodal Learning for Disaster Prediction, Management and Relief
Rohit Saha
Meng Fang
Angeline Yasodhara
Kyryl Truskovskyi
Azin Asgarian
D. Homola
Raahil Shah
Frederik Dieleman
Jack Weatheritt
Thomas Rogers
139
3
0
18 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A SurveyACM Computing Surveys (ACM CSUR), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
422
165
0
18 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
OmniMAE: Single Model Masked Pretraining on Images and VideosComputer Vision and Pattern Recognition (CVPR), 2022
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
251
116
0
16 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic LearningNeural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
260
114
0
16 Jun 2022
Discrete Contrastive Diffusion for Cross-Modal Music and Image
  Generation
Discrete Contrastive Diffusion for Cross-Modal Music and Image GenerationInternational Conference on Learning Representations (ICLR), 2022
Ye Zhu
Yuehua Wu
Kyle Olszewski
Jian Ren
Sergey Tulyakov
Yan Yan
DiffM
358
56
0
15 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
475
809
0
13 Jun 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Few-Shot Audio-Visual Learning of Environment AcousticsNeural Information Processing Systems (NeurIPS), 2022
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
212
67
0
08 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning
  on Multimodal and Temporal Data
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
Shohreh Deldari
Hao Xue
Aaqib Saeed
Jiayuan He
Daniel V. Smith
Flora D. Salim
AI4TS
207
43
0
06 Jun 2022
Self-supervised Learning of Audio Representations from Audio-Visual Data
  using Spatial Alignment
Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial AlignmentIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Shanshan Wang
Archontis Politis
A. Mesaros
Maria Sandsten
SSL
109
10
0
02 Jun 2022
Deep Learning for Visual Speech Analysis: A Survey
Deep Learning for Visual Speech Analysis: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Yike Guo
Xin Xu
M. Pietikäinen
Tianpeng Liu
VLM
263
52
0
22 May 2022
SoK: The Impact of Unlabelled Data in Cyberthreat Detection
SoK: The Impact of Unlabelled Data in Cyberthreat DetectionEuropean Symposium on Security and Privacy (Euro S&P), 2022
Giovanni Apruzzese
Pavel Laskov
A.T. Tastemirova
214
41
0
18 May 2022
Learning Visual Styles from Audio-Visual Associations
Learning Visual Styles from Audio-Visual AssociationsEuropean Conference on Computer Vision (ECCV), 2022
Tingle Li
Yichen Liu
Andrew Owens
Hang Zhao
DiffM
164
26
0
10 May 2022
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
Mahdi M. Kalayeh
Shervin Ardeshir
Lingyi Liu
Nagendra Kamath
Ashok Chandrashekar
SSL
131
3
0
29 Apr 2022
Sound Localization by Self-Supervised Time Delay Estimation
Sound Localization by Self-Supervised Time Delay EstimationEuropean Conference on Computer Vision (ECCV), 2022
Ziyang Chen
David Fouhey
Andrew Owens
SSL
206
23
0
26 Apr 2022
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion
Learning to Listen: Modeling Non-Deterministic Dyadic Facial MotionComputer Vision and Pattern Recognition (CVPR), 2022
Evonne Ng
Hanbyul Joo
Liwen Hu
Hao Li
Trevor Darrell
Angjoo Kanazawa
Shiry Ginosar
VGen
138
123
0
18 Apr 2022
How to Listen? Rethinking Visual Sound Localization
How to Listen? Rethinking Visual Sound LocalizationInterspeech (Interspeech), 2022
Ho-Hsiang Wu
Magdalena Fuentes
Prem Seetharaman
J. P. Bello
ObjD
94
5
0
11 Apr 2022
Previous
12345...8910
Next