Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.03641
Cited By
v1
v2 (latest)
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
10 April 2018
Andrew Owens
Alexei A. Efros
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"
50 / 491 papers shown
Visually Informed Binaural Audio Generation without Binaural Audios
Computer Vision and Pattern Recognition (CVPR), 2021
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
145
67
0
13 Apr 2021
Object Priors for Classifying and Localizing Unseen Actions
International Journal of Computer Vision (IJCV), 2021
Pascal Mettes
William Thong
Cees G. M. Snoek
245
21
0
10 Apr 2021
Towards Fine-grained Visual Representations by Combining Contrastive Learning with Image Reconstruction and Attention-weighted Pooling
Jonas Dippel
Steffen Vogler
Johannes Höhne
198
23
0
09 Apr 2021
MPN: Multimodal Parallel Network for Audio-Visual Event Localization
IEEE International Conference on Multimedia and Expo (ICME), 2021
Jiashuo Yu
Ying Cheng
Rui Feng
214
21
0
07 Apr 2021
Contrastive Learning of Global-Local Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
SSL
188
7
0
07 Apr 2021
Localizing Visual Sounds the Hard Way
Computer Vision and Pattern Recognition (CVPR), 2021
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
210
225
0
06 Apr 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Computer Vision and Pattern Recognition (CVPR), 2021
Yapeng Tian
Di Hu
Chenliang Xu
ObjD
194
92
0
05 Apr 2021
Can audio-visual integration strengthen robustness under multimodal attacks?
Computer Vision and Pattern Recognition (CVPR), 2021
Yapeng Tian
Chenliang Xu
AAML
304
40
0
05 Apr 2021
Cross-Modal learning for Audio-Visual Video Parsing
Interspeech (Interspeech), 2021
Jatin Lamba
Abhishek
Jayaprakash Akula
Rishabh Dabral
Preethi Jyothi
Ganesh Ramakrishnan
240
9
0
03 Apr 2021
Touch-based Curiosity for Sparse-Reward Tasks
Sai Rajeswar
Cyril Ibrahim
Nitin Surya
Florian Golemo
David Vazquez
Rameswar Panda
Pedro H. O. Pinheiro
139
6
0
01 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Computer Vision and Image Understanding (CVIU), 2021
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
184
40
0
01 Apr 2021
Collaborative Learning to Generate Audio-Video Jointly
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
V. Kurmi
Vipul Bajaj
Badri N. Patro
K. Venkatesh
Vinay P. Namboodiri
Preethi Jyothi
VGen
154
11
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
IEEE International Conference on Computer Vision (ICCV), 2021
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
296
138
0
30 Mar 2021
Robust Audio-Visual Instance Discrimination
Computer Vision and Pattern Recognition (CVPR), 2021
Pedro Morgado
Ishan Misra
Nuno Vasconcelos
SSL
246
117
0
29 Mar 2021
Discriminative Semantic Transitive Consistency for Cross-Modal Learning
Computer Vision and Image Understanding (CVIU), 2021
Kranti K. Parida
Gaurav Sharma
205
1
0
25 Mar 2021
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Computer Vision and Pattern Recognition (CVPR), 2021
Jiyoung Lee
Soo-Whan Chung
Sunok Kim
Hong-Goo Kang
Kwanghoon Sohn
170
59
0
25 Mar 2021
Weakly-supervised Audio-visual Sound Source Detection and Separation
IEEE International Conference on Multimedia and Expo (ICME), 2021
Tanzila Rahman
Leonid Sigal
109
8
0
25 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
IEEE International Conference on Computer Vision (ICCV), 2021
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
271
36
0
18 Mar 2021
Beyond Image to Depth: Improving Depth Prediction using Echoes
Computer Vision and Pattern Recognition (CVPR), 2021
Kranti K. Parida
Siddharth Srivastava
Gaurav Sharma
MDE
288
42
0
15 Mar 2021
Multi-Format Contrastive Learning of Audio Representations
Luyu Wang
Aaron van den Oord
172
63
0
11 Mar 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
Ryo Masumura
191
8
0
02 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Computer Vision and Pattern Recognition (CVPR), 2021
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
223
84
0
01 Mar 2021
Audiovisual Highlight Detection in Videos
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Karel Mundnich
Alexandra Fenster
Aparna Khare
Shiva Sundaram
108
6
0
11 Feb 2021
Template-Free Try-on Image Synthesis via Semantic-guided Optimization
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Chien-Lung Chou
Chieh-Yun Chen
Chia-Wei Hsieh
Hong-Han Shuai
Jiaying Liu
Wen-Huang Cheng
3DH
113
17
0
06 Feb 2021
Learning Audio-Visual Correlations from Variational Cross-Modal Generation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
SSL
266
20
0
05 Feb 2021
Collaboration among Image and Object Level Features for Image Colourisation
Rita Pucci
C. Micheloni
N. Martinel
126
1
0
19 Jan 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
IEEE International Conference on Computer Vision (ICCV), 2021
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
342
63
0
11 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Computer Vision and Pattern Recognition (CVPR), 2021
Ruohan Gao
Kristen Grauman
CVBM
450
239
0
08 Jan 2021
Human Action Recognition from Various Data Modalities: A Review
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
584
699
0
22 Dec 2020
Semantic Audio-Visual Navigation
Computer Vision and Pattern Recognition (CVPR), 2020
Changan Chen
Ziad Al-Halah
Kristen Grauman
292
117
0
21 Dec 2020
Visual Speech Enhancement Without A Real Visual Stream
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Sindhu B. Hegde
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
DiffM
135
21
0
20 Dec 2020
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020
Samyak Jain
P. Yarlagadda
Shreyank Jyoti
Shyamgopal Karthik
Subramanian Ramanathan
Vineet Gandhi
ViT
308
82
0
11 Dec 2020
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee
Youngjae Yu
Gunhee Kim
Thomas Breuel
Jan Kautz
Yale Song
ViT
275
89
0
08 Dec 2020
Rethinking movie genre classification with fine-grained semantic clustering
Edward Fish
Jon Weinbren
Andrew Gilbert
VLM
167
10
0
04 Dec 2020
Multi-modal Fusion for Single-Stage Continuous Gesture Recognition
Harshala Gammulle
Akila Pemasiri
Sridha Sridharan
Clinton Fookes
SLR
335
37
0
10 Nov 2020
Multi-Modal Learning of Keypoint Predictive Models for Visual Object Manipulation
Sarah Bechtle
Neha Das
Franziska Meier
SSL
159
6
0
08 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
178
138
0
03 Nov 2020
A Two-Stage Approach to Device-Robust Acoustic Scene Classification
Hu Hu
Chao-Han Huck Yang
Xianjun Xia
Xue Bai
Xin Tang
...
Yuanjun Zhao
Sabato Marco Siniscalchi
Yannan Wang
Jun Du
Chin-Hui Lee
133
34
0
03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
International Conference on Learning Representations (ICLR), 2020
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
352
78
0
02 Nov 2020
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning
L. Tao
Xueting Wang
T. Yamasaki
VLM
SSL
251
14
0
29 Oct 2020
Remixing Music with Visual Conditioning
IEEE International Symposium on Multimedia (ISM), 2020
Li-Chia Yang
Alexander Lerch
112
4
0
27 Oct 2020
Listening to Sounds of Silence for Speech Denoising
Ruilin Xu
Rundi Wu
Y. Ishiwaka
Carl Vondrick
Changxi Zheng
203
37
0
22 Oct 2020
Contrastive Learning of General-Purpose Audio Representations
Aaqib Saeed
David Grangier
Neil Zeghidour
VLM
SSL
253
311
0
21 Oct 2020
LT-GAN: Self-Supervised GAN with Latent Transformation Detection
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Parth Patel
Nupur Kumari
M. Singh
Balaji Krishnamurthy
151
20
0
19 Oct 2020
i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning
Kibok Lee
Yian Zhu
Kihyuk Sohn
Chun-Liang Li
Jinwoo Shin
Honglak Lee
SSL
207
26
0
17 Oct 2020
Muse: Multi-modal target speaker extraction with visual cues
Zexu Pan
Ruijie Tao
Chenglin Xu
Haizhou Li
310
63
0
15 Oct 2020
MS
2
^2
2
L: Multi-Task Self-Supervised Learning for Skeleton Based Action Recognition
Lilang Lin
Sijie Song
Wenhan Yang
Jiaying Liu
SSL
219
231
0
12 Oct 2020
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
200
149
0
12 Oct 2020
SEMI: Self-supervised Exploration via Multisensory Incongruity
IEEE International Conference on Robotics and Automation (ICRA), 2020
Jianren Wang
Ziwen Zhuang
Hang Zhao
SSL
167
1
0
26 Sep 2020
Active Contrastive Learning of Audio-Visual Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
VLM
SSL
168
9
0
31 Aug 2020
Previous
1
2
3
...
10
6
7
8
9
Next