Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.03641
Cited By
v1
v2 (latest)
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
10 April 2018
Andrew Owens
Alexei A. Efros
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"
50 / 491 papers shown
Title
Audio-driven Talking Face Generation with Stabilized Synchronization Loss
European Conference on Computer Vision (ECCV), 2023
Dogucan Yaman
Fevziye Irem Eyiokur
Leonard Barmann
H. K. Ekenel
Alexander Waibel
CVBM
364
10
0
18 Jul 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
305
8
0
10 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
231
17
0
05 Jul 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Zengjie Song
Zhaoxiang Zhang
159
5
0
19 Jun 2023
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Neural Information Processing Systems (NeurIPS), 2023
Kazuki Shimada
Archontis Politis
Parthasaarathy Sudarsanam
D. Krause
Kengo Uchida
...
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Maria Sandsten
Yuki Mitsufuji
234
84
0
15 Jun 2023
Video-to-Music Recommendation using Temporal Alignment of Segments
IEEE transactions on multimedia (IEEE TMM), 2023
Laure Prétet
G. Richard
Clement Souchier
Geoffroy Peeters
AI4TS
133
19
0
12 Jun 2023
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Neural Information Processing Systems (NeurIPS), 2023
Zihui Xue
Kristen Grauman
EgoV
234
47
0
08 Jun 2023
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects
Computer Vision and Pattern Recognition (CVPR), 2023
Ruohan Gao
Yiming Dou
Hao Li
Tanmay Agarwal
Jeannette Bohg
Yunzhu Li
Li Fei-Fei
Jiajun Wu
138
50
0
01 Jun 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
IEEE International Conference on Robotics and Automation (ICRA), 2023
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
301
14
0
01 Jun 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Neural Information Processing Systems (NeurIPS), 2023
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
212
22
0
27 May 2023
Real-Time Idling Vehicles Detection using Combined Audio-Visual Deep Learning
Xiwen Li
Tristalee Mangin
Surojit Saha
Evan K. Blanchard
Di Tang
Henry Poppe
Nathan Searle
Ouk Choi
Kerry E Kelly
Ross T. Whitaker
138
9
0
23 May 2023
Annotation-free Audio-Visual Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOS
VLM
348
46
0
18 May 2023
How does Contrastive Learning Organize Images?
Yunzhe Zhang
Yao Lu
Qi Xuan
SSL
140
2
0
17 May 2023
ImageBind: One Embedding Space To Bind Them All
Computer Vision and Pattern Recognition (CVPR), 2023
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
512
1,278
0
09 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
European Conference on Computer Vision (ECCV), 2023
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
323
16
0
06 May 2023
Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based Action Recognition
Image and Vision Computing (IVC), 2023
Sergio Valcarcel Macua
Yongqiang Tang
Zhizhong Zhang
Wensheng Zhang
227
14
0
03 May 2023
Conditional Generation of Audio from Video via Foley Analogies
Computer Vision and Pattern Recognition (CVPR), 2023
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
177
58
0
17 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
370
3
0
12 Apr 2023
Self-Supervised Multimodal Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
299
82
0
31 Mar 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Computer Vision and Pattern Recognition (CVPR), 2023
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffM
VGen
184
52
0
30 Mar 2023
Egocentric Auditory Attention Localization in Conversations
Computer Vision and Pattern Recognition (CVPR), 2023
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
219
23
0
28 Mar 2023
Joint fMRI Decoding and Encoding with Latent Embedding Alignment
Xuelin Qian
Yikai Wang
Yanwei Fu
Xinwei Sun
Xiangyang Xue
Jianfeng Feng
160
8
0
26 Mar 2023
ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised Pointcloud Understanding
IEEE International Conference on Robotics and Automation (ICRA), 2023
Hongyu Sun
Yongcai Wang
Xudong Cai
Xuewei Bai
Deying Li
ViT
3DPC
259
8
0
25 Mar 2023
Egocentric Audio-Visual Object Localization
Computer Vision and Pattern Recognition (CVPR), 2023
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
171
44
0
23 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Computer Vision and Pattern Recognition (CVPR), 2023
Tiantian Geng
Teng Wang
Yanfu Zhang
Runmin Cong
Feng Zheng
181
59
0
22 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
127
2
0
21 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
412
66
0
21 Mar 2023
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
IEEE International Conference on Computer Vision (ICCV), 2023
Ziyang Chen
Shengyi Qian
Andrew Owens
224
18
0
20 Mar 2023
A Light Weight Model for Active Speaker Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Junhua Liao
Haihan Duan
Kanghui Feng
Wanbing Zhao
Yanbing Yang
Liangyin Chen
200
61
0
08 Mar 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision
AAAI Conference on Artificial Intelligence (AAAI), 2023
Simon Jenni
Alexander Black
John Collomosse
SSL
182
24
0
15 Feb 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Neural Information Processing Systems (NeurIPS), 2023
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
VGen
329
58
0
04 Feb 2023
Neural Target Speech Extraction: An Overview
IEEE Signal Processing Magazine (IEEE Signal Process. Mag.), 2023
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
180
133
0
31 Jan 2023
Audio-Visual Segmentation with Semantics
International Journal of Computer Vision (IJCV), 2023
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
164
72
0
30 Jan 2023
Skeleton-based Action Recognition through Contrasting Two-Stream Spatial-Temporal Networks
IEEE transactions on multimedia (IEEE TMM), 2023
Chen Pang
Xuequan Lu
Lei Lyu
233
32
0
27 Jan 2023
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
207
23
0
23 Jan 2023
Novel-View Acoustic Synthesis
Computer Vision and Pattern Recognition (CVPR), 2023
Changan Chen
Alexander Richard
Roman Shapovalov
V. Ithapu
Natalia Neverova
Kristen Grauman
Andrea Vedaldi
198
45
0
20 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
200
28
0
19 Jan 2023
EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata
Computer Vision and Pattern Recognition (CVPR), 2023
Chenhao Zheng
Ayush Shrivastava
Andrew Owens
VLM
327
22
0
11 Jan 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Chao Feng
Ziyang Chen
Andrew Owens
224
108
0
04 Jan 2023
MAViL: Masked Audio-Video Learners
Neural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
286
73
0
15 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Computer Vision and Pattern Recognition (CVPR), 2022
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
288
106
0
15 Dec 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
International Conference on Learning Representations (ICLR), 2022
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
245
35
0
14 Dec 2022
Audiovisual Masked Autoencoders
IEEE International Conference on Computer Vision (ICCV), 2022
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
278
56
0
09 Dec 2022
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
British Machine Vision Conference (BMVC), 2022
Yating Xu
Conghui Hu
G. Lee
VGen
345
1
0
09 Dec 2022
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
Conference on Robot Learning (CoRL), 2022
Hao Li
Yizhi Zhang
Junzhe Zhu
Shaoxiong Wang
Michelle A. Lee
Huazhe Xu
Edward H. Adelson
Li Fei-Fei
Ruohan Gao
Jiajun Wu
175
88
0
07 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Computer Vision and Pattern Recognition (CVPR), 2022
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
267
38
0
07 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
IEEE Open Journal of Signal Processing (JOSP), 2022
Rahul Sharma
Shrikanth Narayanan
197
11
0
01 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Computer Vision and Pattern Recognition (CVPR), 2022
Xixi Hu
Ziyang Chen
Andrew Owens
181
65
0
28 Nov 2022
Touch and Go: Learning from Human-Collected Vision and Touch
Neural Information Processing Systems (NeurIPS), 2022
Fengyu Yang
Chenyang Ma
Jiacheng Zhang
Jing Zhu
Wenzhen Yuan
Andrew Owens
212
89
0
22 Nov 2022
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
169
0
0
20 Nov 2022
Previous
1
2
3
4
5
6
...
8
9
10
Next