Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.00230
Cited By
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
30 June 2018
Bruno Korbar
Du Tran
Lorenzo Torresani
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization"
37 / 137 papers shown
Title
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
33
123
0
23 Nov 2020
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning
Zehua Zhang
David J. Crandall
AI4TS
SSL
28
23
0
23 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
49
417
0
14 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
27
121
0
03 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
39
69
0
02 Nov 2020
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning
L. Tao
Xueting Wang
T. Yamasaki
VLM
SSL
23
14
0
29 Oct 2020
Hard Negative Mixing for Contrastive Learning
Yannis Kalantidis
Mert Bulent Sariyildiz
Noé Pion
Philippe Weinzaepfel
Diane Larlus
SSL
53
628
0
02 Oct 2020
Sense and Learn: Self-Supervision for Omnipresent Sensors
Aaqib Saeed
Victor Ungureanu
Beat Gfeller
OOD
SSL
22
39
0
28 Sep 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
36
106
0
13 Aug 2020
Spatiotemporal Contrastive Video Representation Learning
Rui Qian
Tianjian Meng
Boqing Gong
Ming-Hsuan Yang
Haoran Wang
Serge J. Belongie
Huayu Chen
SSL
AI4TS
41
492
0
09 Aug 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
40
48
0
29 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
34
180
0
21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
32
81
0
20 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
372
0
29 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
24
75
0
11 Jun 2020
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
Abhinav Shukla
Stavros Petridis
M. Pantic
SSL
32
28
0
04 May 2020
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
30
38
0
08 Apr 2020
Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani
Chen Sun
David A. Ross
Rahul Sukthankar
Cordelia Schmid
Andrew Zisserman
33
54
0
30 Mar 2020
Watching the World Go By: Representation Learning from Unlabeled Videos
Daniel Gordon
Kiana Ehsani
Dieter Fox
Ali Farhadi
SSL
AI4TS
29
87
0
18 Mar 2020
Cross-modal Learning for Multi-modal Video Categorization
Palash Goyal
Saurabh Sahu
Shalini Ghosh
Chul Lee
15
8
0
07 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
17
121
0
06 Mar 2020
Evolving Losses for Unsupervised Video Representation Learning
A. Piergiovanni
A. Angelova
Michael S. Ryoo
SSL
27
138
0
26 Feb 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
21
88
0
20 Feb 2020
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
207
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
27
73
0
09 Jan 2020
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
42
428
0
28 Nov 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
33
52
0
20 Nov 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
35
91
0
30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
332
0
22 Aug 2019
Multi-task Self-Supervised Learning for Human Activity Detection
Aaqib Saeed
T. Ozcelebi
J. Lukkien
SSL
23
270
0
27 Jul 2019
Evolving Losses for Unlabeled Video Representation Learning
A. Piergiovanni
A. Angelova
Michael S. Ryoo
SSL
11
7
0
07 Jun 2019
What Makes Training Multi-Modal Classification Networks Hard?
Weiyao Wang
Du Tran
Matt Feiszli
31
442
0
29 May 2019
Self-supervised audio representation learning for mobile devices
Marco Tagliasacchi
Beat Gfeller
Félix de Chaumont Quitry
Dominik Roblek
SSL
AI4TS
6
46
0
24 May 2019
DynamoNet: Dynamic Action and Motion Network
Ali Diba
Vivek Sharma
Luc Van Gool
Rainer Stiefelhagen
30
110
0
25 Apr 2019
Revisiting Self-Supervised Visual Representation Learning
Alexander Kolesnikov
Xiaohua Zhai
Lucas Beyer
SSL
53
715
0
25 Jan 2019
Previous
1
2
3