Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.03641
Cited By
v1
v2 (latest)
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
10 April 2018
Andrew Owens
Alexei A. Efros
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"
50 / 491 papers shown
Title
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Jiangliu Wang
Jianbo Jiao
Linchao Bao
Shengfeng He
Wei Liu
Yunhui Liu
SSL
AI4TS
192
58
0
31 Aug 2020
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
213
13
0
18 Aug 2020
Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound
Jianbo Jiao
Yifan Cai
M. Alsharid
L. Drukker
A. Papageorghiou
J. A. Noble
215
42
0
14 Aug 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
137
72
0
14 Aug 2020
Self-supervised Video Representation Learning by Pace Prediction
Jiangliu Wang
Jianbo Jiao
Yunhui Liu
SSL
AI4TS
215
251
0
13 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
249
116
0
13 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
European Conference on Computer Vision (ECCV), 2020
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
211
277
0
10 Aug 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Gaowen Liu
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
222
50
0
29 Jul 2020
Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020
Yoshiki Masuyama
Yoshiaki Bando
Kohei Yatabe
Y. Sasaki
Masaki Onishi
Yasuhiro Oikawa
SSL
163
14
0
28 Jul 2020
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2020
Chuang Gan
Xiaoyu Chen
Phillip Isola
Antonio Torralba
J. Tenenbaum
136
7
0
27 Jul 2020
Federated Self-Supervised Learning of Multi-Sensor Representations for Embedded Intelligence
IEEE Internet of Things Journal (IEEE IoT J.), 2020
Aaqib Saeed
Flora D. Salim
T. Ozcelebi
J. Lukkien
FedML
SSL
225
116
0
25 Jul 2020
Self-Supervised Learning Across Domains
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
S. Bucci
A. DÍnnocente
Yujun Liao
Fabio Maria Carlucci
Barbara Caputo
Tatiana Tommasi
SSL
184
90
0
24 Jul 2020
Sound2Sight: Generating Visual Dynamics from Sound and Context
European Conference on Computer Vision (ECCV), 2020
A. Cherian
Moitreya Chatterjee
Narendra Ahuja
VGen
235
40
0
23 Jul 2020
Foley Music: Learning to Generate Music from Videos
Chuang Gan
Deng Huang
Peihao Chen
J. Tenenbaum
Antonio Torralba
VGen
119
152
0
21 Jul 2020
Video Representation Learning by Recognizing Temporal Transformations
Simon Jenni
Givi Meishvili
Paolo Favaro
381
142
0
21 Jul 2020
CSLNSpeech: solving extended speech separation problem with the help of Chinese sign language
Jiasong Wu
Xuan Li
Taotao Li
Fanman Meng
Youyong Kong
Guanyu Yang
L. Senhadji
Huazhong Shu
CVBM
174
0
0
21 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
240
207
0
21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
European Conference on Computer Vision (ECCV), 2020
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
191
94
0
20 Jul 2020
Leveraging Category Information for Single-Frame Visual Sound Source Separation
Xiangjie Sui
Esa Rahtu
129
9
0
15 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
European Conference on Computer Vision (ECCV), 2020
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
241
179
0
13 Jul 2020
OtoWorld: Towards Learning to Separate by Learning to Move
Omkar Ranadive
Grant Gasser
David Terpay
Prem Seetharaman
125
1
0
12 Jul 2020
Do We Need Sound for Sound Source Localization?
Asian Conference on Computer Vision (ACCV), 2020
Takashi Oya
Shohei Iwase
Ryota Natsume
Takahiro Itazuri
Shugo Yamaguchi
Shigeo Morishima
131
25
0
11 Jul 2020
See, Hear, Explore: Curiosity via Audio-Visual Association
Neural Information Processing Systems (NeurIPS), 2020
Victoria Dean
Shubham Tulsiani
Abhinav Gupta
228
64
0
07 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
369
396
0
29 Jun 2020
Space-Time Correspondence as a Contrastive Random Walk
Allan Jabri
Andrew Owens
Alexei A. Efros
SSL
OT
308
331
0
25 Jun 2020
Labelling unlabelled videos from scratch with multi-modal self-supervision
Neural Information Processing Systems (NeurIPS), 2020
Yuki M. Asano
Mandela Patrick
Christian Rupprecht
Andrea Vedaldi
SSL
257
161
0
24 Jun 2020
Self-Supervised Graph Transformer on Large-Scale Molecular Data
Yu Rong
Yatao Bian
Qifeng Bai
Wei-yang Xie
Ying Wei
Wenbing Huang
Junzhou Huang
AI4CE
256
28
0
18 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
210
146
0
16 Jun 2020
Solos: A Dataset for Audio-Visual Music Analysis
IEEE International Workshop on Multimedia Signal Processing (MMSP), 2020
Juan F. Montesinos
Olga Slizovskaia
G. Haro
109
15
0
14 Jun 2020
Video Understanding as Machine Translation
Bruno Korbar
Fabio Petroni
Rohit Girdhar
Lorenzo Torresani
SSL
195
29
0
12 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Computer Vision and Pattern Recognition (CVPR), 2020
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
190
87
0
11 Jun 2020
Kalman Filter Based Multiple Person Head Tracking
M. Ullah
M. Mahmud
Habib Ullah
Kashif Ahmad
Ali Shariq Imran
F. A. Cheikh
77
0
0
11 Jun 2020
C-SL: Contrastive Sound Localization with Inertial-Acoustic Sensors
Majid Mirbagheri
Bardia Doosti
104
2
0
09 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Xiangjie Sui
Esa Rahtu
195
24
0
04 Jun 2020
In the Eye of the Beholder: Gaze and Actions in First Person Video
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Yin Li
Miao Liu
James M. Rehg
EgoV
262
90
0
31 May 2020
Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data
International Joint Conference on Artificial Intelligence (IJCAI), 2020
Haytham M. Fayek
Anurag Kumar
196
37
0
29 May 2020
AVGZSLNet: Audio-Visual Generalized Zero-Shot Learning by Reconstructing Label Features from Multi-Modal Embeddings
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Pratik Mazumder
Pravendra Singh
Kranti K. Parida
Vinay P. Namboodiri
214
38
0
27 May 2020
Active Speakers in Context
Juan Carlos León Alcázar
Fabian Caba Heilbron
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbelaez
Guohao Li
122
72
0
20 May 2020
End-to-End Lip Synchronisation Based on Pattern Classification
You Jin Kim
Hee-Soo Heo
Soo-Whan Chung
Bong-Jin Lee
CVBM
162
0
0
18 May 2020
Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition
Di Hu
Xuhong Li
Lichao Mou
P. Jin
Dong Chen
L. Jing
Xiaoxiang Zhu
Dejing Dou
159
6
0
18 May 2020
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Vladimir E. Iashin
Esa Rahtu
219
128
0
17 May 2020
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
158
74
0
14 May 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
European Conference on Computer Vision (ECCV), 2020
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
425
90
0
04 May 2020
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
IEEE Transactions on Affective Computing (IEEE TAC), 2020
Abhinav Shukla
Stavros Petridis
Maja Pantic
SSL
393
33
0
04 May 2020
Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images
European Conference on Computer Vision (ECCV), 2020
Matthew Purri
Kristin J. Dana
118
20
0
29 Apr 2020
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
Interspeech (Interspeech), 2020
Soo-Whan Chung
Hong-Goo Kang
Joon Son Chung
SSL
156
43
0
29 Apr 2020
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Computer Vision and Pattern Recognition (CVPR), 2020
Pedro Morgado
Nuno Vasconcelos
Ishan Misra
SSL
286
294
0
27 Apr 2020
On the Role of Visual Cues in Audiovisual Speech Enhancement
Zakaria Aldeneh
Anushree Prasanna Kumar
B. Theobald
Erik Marchi
S. Kajarekar
Devang Naik
Ahmed Hussen Abdelaziz
264
7
0
25 Apr 2020
Music Gesture for Visual Sound Separation
Computer Vision and Pattern Recognition (CVPR), 2020
Chuang Gan
Deng Huang
Hang Zhao
J. Tenenbaum
Antonio Torralba
222
214
0
20 Apr 2020
Stochastic batch size for adaptive regularization in deep network optimization
Pattern Recognition (Pattern Recognit.), 2020
Kensuke Nakamura
Stefano Soatto
Byung-Woo Hong
ODL
147
7
0
14 Apr 2020
Previous
1
2
3
...
10
7
8
9
Next