Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.03641
Cited By
v1
v2 (latest)
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
10 April 2018
Andrew Owens
Alexei A. Efros
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"
50 / 491 papers shown
Title
Segmenting Collision Sound Sources in Egocentric Videos
Kranti Parida
Omar Emara
Hazel Doughty
Dima Damen
VOS
210
0
0
17 Nov 2025
CAVER: Curious Audiovisual Exploring Robot
Luca Macesanu
Boueny Folefack
Samik Singh
Ruchira Ray
Ben Abbatematteo
R. M. Martin
92
0
0
10 Nov 2025
Mixup Helps Understanding Multimodal Video Better
Xiaoyu Ma
Ding Ding
Hao Chen
100
0
0
13 Oct 2025
Disentanglement of Variations with Multimodal Generative Modeling
Yijie Zhang
Yiyang Shen
Weiran Wang
72
0
0
28 Sep 2025
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling
Chao Huang
Susan Liang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
107
0
0
26 Sep 2025
Temporal vs. Spatial: Comparing DINOv3 and V-JEPA2 Feature Representations for Video Action Analysis
Sai Varun Kodathala
Rakesh Vunnam
80
0
0
25 Sep 2025
Self-Supervised Cross-Modal Learning for Image-to-Point Cloud Registration
Xingmei Wang
Xiaoyu Hu
Chengkai Huang
Ziyan Zeng
Guohao Nie
Quan Z. Sheng
L. Yao
3DPC
72
0
0
19 Sep 2025
Learning from Silence and Noise for Visual Sound Source Localization
Xavier Juanola
G. Morais
Magdalena Fuentes
Gloria Haro
SSL
144
0
0
29 Aug 2025
AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning
Shu Shen
Chao Chen
Tong Zhang
196
0
0
27 Aug 2025
Social-MAE: A Transformer-Based Multimodal Autoencoder for Face and Voice
IEEE International Conference on Automatic Face & Gesture Recognition (FG), 2024
Hugo Bohy
M. Tran
Kevin El Haddad
Thierry Dutoit
M. Soleymani
112
2
0
24 Aug 2025
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev
Thaddäus Wiedemer
Christian Schroeder de Witt
Matthias Bethge
Wieland Brendel
A. Sophia Koepke
AuLLM
215
3
0
11 Aug 2025
ASAudio: A Survey of Advanced Spatial Audio Research
Zhiyuan Zhu
Yu Zhang
Wenxiang Guo
Changhao Pan
Zhou Zhao
141
1
0
08 Aug 2025
Real-Time Audio-Visual Speech Enhancement Using Pre-trained Visual Representations
Teng
Sile Yin
Li-Chia Yang
Shuo Zhang
120
1
0
29 Jul 2025
Attention-Driven Multimodal Alignment for Long-term Action Quality Assessment
Applied Soft Computing (ASC), 2025
Xin Wang
Peng-Jie Li
Yuan-Yuan Shen
123
0
0
29 Jul 2025
Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation
Dogucan Yaman
Fevziye Irem Eyiokur
Leonard Barmann
H. K. Ekenel
Alexander H. Waibel
CVBM
166
0
0
28 Jul 2025
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Huilai Li
Yonghao Dang
Ying Xing
Yiming Wang
Jianqin Yin
143
0
0
14 Jul 2025
Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance
Akio Hayakawa
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
DiffM
VGen
233
1
0
26 Jun 2025
A Survey on World Models Grounded in Acoustic Physical Information
Xiaoliang Chen
Le Chang
Xin Yu
Yunhe Huang
Xianling Tu
SyDa
AI4CE
165
1
0
16 Jun 2025
Improving Multimodal Learning Balance and Sufficiency through Data Remixing
Xiaoyu Ma
Hao Chen
Yongjian Deng
212
4
0
13 Jun 2025
Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes
Computer Vision and Pattern Recognition (CVPR), 2025
Yiming Dou
Wonseok Oh
Yuqing Luo
Antonio Loquercio
Andrew Owens
162
0
0
11 Jun 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
369
1
0
04 Jun 2025
Learning to Highlight Audio by Watching Movies
Computer Vision and Pattern Recognition (CVPR), 2025
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
221
3
0
17 May 2025
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Computer Vision and Pattern Recognition (CVPR), 2025
Edson Araujo
Andrew Rouditchenko
Yuan Gong
Saurabhchand Bhati
Samuel Thomas
Brian Kingsbury
Leonid Karlinsky
Rogerio Feris
James Glass
Hilde Kuehne
314
1
0
02 May 2025
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
Computer Vision and Pattern Recognition (CVPR), 2025
Mingfei Chen
I. D. Gebru
Ishwarya Ananthabhotla
Christian Richardt
Dejan Marković
Jake Sandakly
Steven Krenn
Todd Keebler
Eli Shlizerman
Alexander Richard
248
2
0
08 Apr 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Computer Vision and Pattern Recognition (CVPR), 2025
Chen Liu
Liying Yang
Peike Li
Dadong Wang
Lincheng Li
Xin Yu
VOS
281
3
0
17 Mar 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Computer Vision and Pattern Recognition (CVPR), 2022
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
299
33
0
02 Jan 2025
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
387
18
0
19 Dec 2024
Learning Self-Supervised Audio-Visual Representations for Sound Recommendations
International Symposium on Visual Computing (ISVC), 2024
Sudha Krishnamurthy
SSL
190
1
0
10 Dec 2024
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
373
3
0
09 Dec 2024
The Sound of Water: Inferring Physical Properties from Pouring Liquids
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
Andrew Zisserman
404
4
0
18 Nov 2024
Continual Audio-Visual Sound Separation
Neural Information Processing Systems (NeurIPS), 2024
Weiguo Pian
Yiyang Nan
Shijian Deng
Shentong Mo
Yunhui Guo
Yapeng Tian
VLM
CLL
332
3
0
05 Nov 2024
USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2024
Luca Jiang-Tao Yu
Running Zhao
Sijie Ji
Edith C.H. Ngai
Chenshu Wu
178
3
0
29 Oct 2024
Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
European Conference on Computer Vision (ECCV), 2024
Lilang Lin
Lehong Wu
Jiahang Zhang
Jiaying Liu
282
6
0
27 Oct 2024
ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation
Vidhi Jain
Rishi Veerapaneni
Yonatan Bisk
140
0
0
24 Oct 2024
A Critical Assessment of Visual Sound Source Localization Models Including Negative Audio
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xavier Juanola
Gloria Haro
Magdalena Fuentes
344
4
0
01 Oct 2024
Self-Supervised Audio-Visual Soundscape Stylization
European Conference on Computer Vision (ECCV), 2024
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
223
7
0
22 Sep 2024
Interpretable Convolutional SyncNet
Sungjoon Park
Jaesub Yun
Donggeon Lee
Minsik Park
259
1
0
02 Sep 2024
Enhancing Sound Source Localization via False Negative Elimination
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Zengjie Song
Jiangshe Zhang
Yuxi Wang
Junsong Fan
Zhaoxiang Zhang
253
3
0
29 Aug 2024
BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval
Zhenyu Lu
Lakshay Sethi
164
0
0
19 Aug 2024
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
European Conference on Computer Vision (ECCV), 2024
Heeseung Yun
Ruohan Gao
Ishwarya Ananthabhotla
Anurag Kumar
Jacob Donley
Chao Li
Gunhee Kim
V. Ithapu
Calvin Murdock
184
6
0
09 Aug 2024
Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Juncheng Ma
Peiwen Sun
Yaoting Wang
Di Hu
VOS
295
21
0
16 Jul 2024
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Jie Yin
Andrew F. Luo
Yilun Du
A. Cherian
Tim K. Marks
Jonathan Le Roux
Chuang Gan
238
1
0
16 Jul 2024
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serrà
272
5
0
08 Jul 2024
CLHOP: Combined Audio-Video Learning for Horse 3D Pose and Shape Estimation
Ci Li
Elin Hernlund
Hedvig Kjellström
Silvia Zuffi
3DH
190
3
0
01 Jul 2024
Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation
Rafael Redondo
145
0
0
23 Jun 2024
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung
Suyeon Lee
Ji-Hoon Kim
Joon Son Chung
DiffM
210
18
0
13 Jun 2024
Translating speech with just images
Dan Oneaţă
Herman Kamper
VLM
112
1
0
11 Jun 2024
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
150
7
0
07 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
321
14
0
06 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
349
1
0
04 Jun 2024
1
2
3
4
...
8
9
10
Next