Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.03641
Cited By
v1
v2 (latest)
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
10 April 2018
Andrew Owens
Alexei A. Efros
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Scene Analysis with Self-Supervised Multisensory Features"
50 / 491 papers shown
Conditioned Source Separation for Music Instrument Performances
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Olga Slizovskaia
G. Haro
E. Gómez
244
43
0
08 Apr 2020
Deep Multimodal Feature Encoding for Video Ordering
Vivek Sharma
Makarand Tapaswi
Rainer Stiefelhagen
171
11
0
05 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2020
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
163
59
0
30 Mar 2020
A Metric Learning Reality Check
European Conference on Computer Vision (ECCV), 2020
Kevin Musgrave
Serge J. Belongie
Ser-Nam Lim
435
504
0
18 Mar 2020
Watching the World Go By: Representation Learning from Unlabeled Videos
Daniel Gordon
Kiana Ehsani
Dieter Fox
Ali Farhadi
SSL
AI4TS
183
92
0
18 Mar 2020
Cross modal video representations for weakly supervised active speaker localization
IEEE transactions on multimedia (TMM), 2020
Rahul Sharma
Krishna Somandepalli
Shrikanth Narayanan
175
8
0
09 Mar 2020
On Compositions of Transformations in Contrastive Self-Supervised Learning
IEEE International Conference on Computer Vision (ICCV), 2020
Mandela Patrick
Yuki M. Asano
Polina Kuznetsova
Ruth C. Fong
João F. Henriques
Geoffrey Zweig
Andrea Vedaldi
236
53
0
09 Mar 2020
Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds
European Conference on Computer Vision (ECCV), 2020
A. Vasudevan
Dengxin Dai
Luc Van Gool
ObjD
206
50
0
09 Mar 2020
Evolving Losses for Unsupervised Video Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2020
A. Piergiovanni
A. Angelova
Michael S. Ryoo
SSL
217
145
0
26 Feb 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning
IEEE transactions on multimedia (TMM), 2020
Sanchita Ghose
John J. Prevost
VGen
171
50
0
21 Feb 2020
AlignNet: A Unifying Approach to Audio-Visual Alignment
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Jianren Wang
Zhaoyuan Fang
Hang Zhao
148
42
0
12 Feb 2020
Self-Supervised Joint Encoding of Motion and Appearance for First Person Action Recognition
M. Planamente
A. Bottino
Barbara Caputo
EgoV
139
3
0
10 Feb 2020
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2020
Jonathan Munro
Dima Damen
EgoV
267
227
0
27 Jan 2020
Curriculum Audiovisual Learning
Di Hu
Zechuan Wang
Haoyi Xiong
Dong Wang
Feiping Nie
Dejing Dou
SSL
129
33
0
26 Jan 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
593
230
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
International Journal of Automation and Computing (IJAC), 2020
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
204
177
0
14 Jan 2020
Unsupervised Audiovisual Synthesis via Exemplar Autoencoders
International Conference on Learning Representations (ICLR), 2020
Kangle Deng
Aayush Bansal
Deva Ramanan
SSL
VGen
163
17
0
13 Jan 2020
Visually Guided Self Supervised Learning of Speech Representations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Abhinav Shukla
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Maja Pantic
SSL
166
30
0
13 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
Computer Vision and Pattern Recognition (CVPR), 2020
A. Tsiami
Petros Koutras
Petros Maragos
229
81
0
09 Jan 2020
Look, Listen, and Act: Towards Audio-Visual Embodied Navigation
IEEE International Conference on Robotics and Automation (ICRA), 2019
Chuang Gan
Yiwei Zhang
Jiajun Wu
Boqing Gong
J. Tenenbaum
214
150
0
25 Dec 2019
SoundSpaces: Audio-Visual Navigation in 3D Environments
Changan Chen
Unnat Jain
Carl Schissler
S. V. A. Garí
Ziad Al-Halah
V. Ithapu
Philip Robinson
Kristen Grauman
268
28
0
24 Dec 2019
Multimodal Self-Supervised Learning for Medical Image Analysis
Information Processing in Medical Imaging (IPMI), 2019
Aiham Taleb
Christoph Lippert
T. Klein
Moin Nabi
SSL
344
122
0
11 Dec 2019
Listen to Look: Action Recognition by Previewing Audio
Computer Vision and Pattern Recognition (CVPR), 2019
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
322
284
0
10 Dec 2019
Self-Supervised Learning of Pretext-Invariant Representations
Computer Vision and Pattern Recognition (CVPR), 2019
Ishan Misra
Laurens van der Maaten
SSL
VLM
343
1,561
0
04 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Neural Information Processing Systems (NeurIPS), 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
493
461
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
166
62
0
20 Nov 2019
MMTM: Multimodal Transfer Module for CNN Fusion
Computer Vision and Pattern Recognition (CVPR), 2019
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
400
344
0
20 Nov 2019
Dancing to Music
Neural Information Processing Systems (NeurIPS), 2019
Hsin-Ying Lee
Xiaodong Yang
Xuan Li
Ting-Chun Wang
Yu-Ding Lu
Ming-Hsuan Yang
Jan Kautz
195
15
0
05 Nov 2019
DEPA: Self-Supervised Audio Embedding for Depression Detection
ACM Multimedia (ACM MM), 2019
Pingyue Zhang
Mengyue Wu
Heinrich Dinkel
Kai Yu
209
74
0
29 Oct 2019
PRNet: Self-Supervised Learning for Partial-to-Partial Registration
Neural Information Processing Systems (NeurIPS), 2019
Yue Wang
Justin Solomon
SSL
3DPC
261
434
0
27 Oct 2019
Self-supervised Moving Vehicle Tracking with Stereo Sound
IEEE International Conference on Computer Vision (ICCV), 2019
Chuang Gan
Hang Zhao
Peihao Chen
David D. Cox
Antonio Torralba
165
156
0
25 Oct 2019
Controllable Attention for Structured Layered Video Decomposition
IEEE International Conference on Computer Vision (ICCV), 2019
Jean-Baptiste Alayrac
João Carreira
Relja Arandjelović
Andrew Zisserman
102
10
0
24 Oct 2019
Vision-Infused Deep Audio Inpainting
IEEE International Conference on Computer Vision (ICCV), 2019
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
298
92
0
24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2019
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
151
48
0
19 Oct 2019
Learning to Generalize One Sample at a Time with Self-Supervision
A. DÍnnocente
S. Bucci
Barbara Caputo
Tatiana Tommasi
SSL
OOD
199
4
0
09 Oct 2019
Learning to Have an Ear for Face Super-Resolution
Computer Vision and Pattern Recognition (CVPR), 2019
Givi Meishvili
Simon Jenni
Paolo Favaro
SupR
CVBM
206
24
0
27 Sep 2019
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement
Information Fusion (Inf. Fusion), 2019
M. Gogate
K. Dashtipour
Ahsan Adeel
Amir Hussain
150
58
0
23 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
IEEE International Conference on Computer Vision (ICCV), 2019
Tanzila Rahman
Bicheng Xu
Leonid Sigal
193
86
0
22 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net
IEEE International Conference on Computer Vision (ICCV), 2019
Xudong Xu
Bo Dai
Dahua Lin
245
93
0
30 Aug 2019
Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model
International Workshop on Machine Learning for Signal Processing (MLSP), 2019
Yoshiaki Bando
Y. Sasaki
Kazuyoshi Yoshii
BDL
106
9
0
29 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
IEEE International Conference on Computer Vision (ICCV), 2019
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
181
379
0
22 Aug 2019
Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
A. Rana
C. Ozcinar
A. Smolic
107
31
0
16 Aug 2019
Charting the Right Manifold: Manifold Mixup for Few-shot Learning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2019
Puneet Mangla
M. Singh
Abhishek Sinha
Nupur Kumari
V. Balasubramanian
Balaji Krishnamurthy
SSL
377
363
0
28 Jul 2019
Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks
IEEE Transactions on robotics (TRO), 2019
Michelle A. Lee
Yuke Zhu
Peter Zachares
Matthew Tan
K. Srinivasan
Silvio Savarese
Fei-Fei Li
Animesh Garg
Jeannette Bohg
SSL
246
247
0
28 Jul 2019
Multi-task Self-Supervised Learning for Human Activity Detection
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2019
Aaqib Saeed
T. Ozcelebi
J. Lukkien
SSL
297
312
0
27 Jul 2019
Adaptive Regularization via Residual Smoothing in Deep Learning Optimization
IEEE Access (IEEE Access), 2019
Jung-Kyun Cho
Junseok Kwon
Byung-Woo Hong
218
1
0
23 Jul 2019
My lips are concealed: Audio-visual speech enhancement through obstructions
Interspeech (Interspeech), 2019
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
167
98
0
11 Jul 2019
LPaintB: Learning to Paint from Self-Supervision
Pacific Conference on Computer Graphics and Applications (PG), 2019
Biao Jia
Jonathan Brandt
R. Měch
Byungmoon Kim
Tianyi Zhou
SSL
122
12
0
17 Jun 2019
What Makes Training Multi-Modal Classification Networks Hard?
Computer Vision and Pattern Recognition (CVPR), 2019
Weiyao Wang
Du Tran
Matt Feiszli
571
566
0
29 May 2019
Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard Effect
Speech Communication (Speech Commun.), 2019
Daniel Michelsanti
Zheng-Hua Tan
S. Sigurðsson
Jesper Jensen
161
42
0
29 May 2019
Previous
1
2
3
...
10
8
9
Next