Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.03641
Cited By
v1
v2 (latest)
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
10 April 2018
Andrew Owens
Alexei A. Efros
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"
50 / 491 papers shown
Audio-driven Talking Face Generation with Stabilized Synchronization Loss
European Conference on Computer Vision (ECCV), 2023
Dogucan Yaman
Fevziye Irem Eyiokur
Leonard Barmann
H. K. Ekenel
Alexander Waibel
CVBM
408
11
0
18 Jul 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
359
8
0
10 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
248
17
0
05 Jul 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Zengjie Song
Zhaoxiang Zhang
168
5
0
19 Jun 2023
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Neural Information Processing Systems (NeurIPS), 2023
Kazuki Shimada
Archontis Politis
Parthasaarathy Sudarsanam
D. Krause
Kengo Uchida
...
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Maria Sandsten
Yuki Mitsufuji
267
86
0
15 Jun 2023
Video-to-Music Recommendation using Temporal Alignment of Segments
IEEE transactions on multimedia (IEEE TMM), 2023
Laure Prétet
G. Richard
Clement Souchier
Geoffroy Peeters
AI4TS
142
19
0
12 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Neural Information Processing Systems (NeurIPS), 2023
Zihui Xue
Kristen Grauman
EgoV
282
47
0
08 Jun 2023
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects
Computer Vision and Pattern Recognition (CVPR), 2023
Ruohan Gao
Yiming Dou
Hao Li
Tanmay Agarwal
Jeannette Bohg
Yunzhu Li
Li Fei-Fei
Jiajun Wu
151
51
0
01 Jun 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
IEEE International Conference on Robotics and Automation (ICRA), 2023
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
325
14
0
01 Jun 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Neural Information Processing Systems (NeurIPS), 2023
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
221
23
0
27 May 2023
Real-Time Idling Vehicles Detection using Combined Audio-Visual Deep Learning
Xiwen Li
Tristalee Mangin
Surojit Saha
Evan K. Blanchard
Di Tang
Henry Poppe
Nathan Searle
Ouk Choi
Kerry E Kelly
Ross T. Whitaker
166
9
0
23 May 2023
Annotation-free Audio-Visual Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOS
VLM
392
46
0
18 May 2023
How does Contrastive Learning Organize Images?
Yunzhe Zhang
Yao Lu
Qi Xuan
SSL
163
2
0
17 May 2023
ImageBind: One Embedding Space To Bind Them All
Computer Vision and Pattern Recognition (CVPR), 2023
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
552
1,303
0
09 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
European Conference on Computer Vision (ECCV), 2023
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
371
17
0
06 May 2023
Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
Image and Vision Computing (IVC), 2023
Sergio Valcarcel Macua
Yongqiang Tang
Zhizhong Zhang
Wensheng Zhang
243
15
0
03 May 2023
Conditional Generation of Audio from Video via Foley Analogies
Computer Vision and Pattern Recognition (CVPR), 2023
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
205
59
0
17 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
394
3
0
12 Apr 2023
Self-Supervised Multimodal Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
319
89
0
31 Mar 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Computer Vision and Pattern Recognition (CVPR), 2023
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffM
VGen
216
53
0
30 Mar 2023
Egocentric Auditory Attention Localization in Conversations
Computer Vision and Pattern Recognition (CVPR), 2023
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
224
23
0
28 Mar 2023
Joint fMRI Decoding and Encoding with Latent Embedding Alignment
Xuelin Qian
Yikai Wang
Yanwei Fu
Xinwei Sun
Xiangyang Xue
Jianfeng Feng
197
8
0
26 Mar 2023
ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised Pointcloud Understanding
IEEE International Conference on Robotics and Automation (ICRA), 2023
Hongyu Sun
Yongcai Wang
Xudong Cai
Xuewei Bai
Deying Li
ViT
3DPC
287
8
0
25 Mar 2023
Egocentric Audio-Visual Object Localization
Computer Vision and Pattern Recognition (CVPR), 2023
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
210
45
0
23 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Computer Vision and Pattern Recognition (CVPR), 2023
Tiantian Geng
Teng Wang
Yanfu Zhang
Runmin Cong
Feng Zheng
192
61
0
22 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
167
2
0
21 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
448
68
0
21 Mar 2023
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
IEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Chen
Shengyi Qian
Andrew Owens
236
19
0
20 Mar 2023
A Light Weight Model for Active Speaker Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Junhua Liao
Haihan Duan
Kanghui Feng
Wanbing Zhao
Yanbing Yang
Liangyin Chen
209
62
0
08 Mar 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision
AAAI Conference on Artificial Intelligence (AAAI), 2023
Simon Jenni
Alexander Black
John Collomosse
SSL
190
24
0
15 Feb 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Neural Information Processing Systems (NeurIPS), 2023
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
VGen
364
59
0
04 Feb 2023
Neural Target Speech Extraction: An Overview
IEEE Signal Processing Magazine (IEEE Signal Process. Mag.), 2023
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
192
134
0
31 Jan 2023
Audio-Visual Segmentation with Semantics
International Journal of Computer Vision (IJCV), 2023
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
172
73
0
30 Jan 2023
Skeleton-based Action Recognition through Contrasting Two-Stream Spatial-Temporal Networks
IEEE transactions on multimedia (IEEE TMM), 2023
Chen Pang
Xuequan Lu
Lei Lyu
249
33
0
27 Jan 2023
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
229
23
0
23 Jan 2023
Novel-View Acoustic Synthesis
Computer Vision and Pattern Recognition (CVPR), 2023
Changan Chen
Alexander Richard
Roman Shapovalov
V. Ithapu
Natalia Neverova
Kristen Grauman
Andrea Vedaldi
213
45
0
20 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
236
28
0
19 Jan 2023
EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata
Computer Vision and Pattern Recognition (CVPR), 2023
Chenhao Zheng
Ayush Shrivastava
Andrew Owens
VLM
346
23
0
11 Jan 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Chao Feng
Ziyang Chen
Andrew Owens
272
112
0
04 Jan 2023
MAViL: Masked Audio-Video Learners
Neural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
322
73
0
15 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Computer Vision and Pattern Recognition (CVPR), 2022
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
320
108
0
15 Dec 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
International Conference on Learning Representations (ICLR), 2022
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
261
36
0
14 Dec 2022
Audiovisual Masked Autoencoders
IEEE International Conference on Computer Vision (ICCV), 2022
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
309
56
0
09 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
British Machine Vision Conference (BMVC), 2022
Yating Xu
Conghui Hu
G. Lee
VGen
382
1
0
09 Dec 2022
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
Conference on Robot Learning (CoRL), 2022
Hao Li
Yizhi Zhang
Junzhe Zhu
Shaoxiong Wang
Michelle A. Lee
Huazhe Xu
Edward H. Adelson
Li Fei-Fei
Ruohan Gao
Jiajun Wu
207
88
0
07 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Computer Vision and Pattern Recognition (CVPR), 2022
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
279
39
0
07 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
IEEE Open Journal of Signal Processing (JOSP), 2022
Rahul Sharma
Shrikanth Narayanan
210
11
0
01 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Computer Vision and Pattern Recognition (CVPR), 2022
Xixi Hu
Ziyang Chen
Andrew Owens
213
65
0
28 Nov 2022
Touch and Go: Learning from Human-Collected Vision and Touch
Neural Information Processing Systems (NeurIPS), 2022
Fengyu Yang
Chenyang Ma
Jiacheng Zhang
Jing Zhu
Wenzhen Yuan
Andrew Owens
254
91
0
22 Nov 2022
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
189
0
0
20 Nov 2022
Previous
1
2
3
4
5
6
...
8
9
10
Next