Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1911.12667
Cited By
v1
v2
v3 (latest)
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Neural Information Processing Systems (NeurIPS), 2019
28 November 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-Supervised Learning by Cross-Modal Audio-Video Clustering"
50 / 280 papers shown
Title
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
Huanjin Yao
Yi Jiang
Xiatian Zhu
Zehuan Yuan
123
28
0
09 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions
Neural Information Processing Systems (NeurIPS), 2022
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
182
28
0
27 Sep 2022
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Neural Information Processing Systems (NeurIPS), 2022
Yiren Jian
Chongyang Gao
Soroush Vosoughi
SSL
221
16
0
20 Sep 2022
ImageArg: A Multi-modal Tweet Dataset for Image Persuasiveness Mining
Workshop on Argument Mining (ArgMining), 2022
Zhexiong Liu
M. Guo
Y. Dai
Diane Litman
119
19
0
14 Sep 2022
Modality Mixer for Multi-modal Action Recognition
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Sumin Lee
Sangmin Woo
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
141
12
0
24 Aug 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yanbei Chen
Goran Frehse
Xiatian Zhu
Zeynep Akata
275
160
0
24 Aug 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
248
66
0
20 Aug 2022
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
Zdravko Marinov
Alina Roitberg
David Schneider
Rainer Stiefelhagen
176
6
0
19 Aug 2022
COCOA: Cross Modality Contrastive Learning for Sensor Data
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2022
Shohreh Deldari
Hao Xue
Aaqib Saeed
Daniel V. Smith
Flora D. Salim
SSL
143
49
0
31 Jul 2022
LocVTP: Video-Text Pre-training for Temporal Localization
European Conference on Computer Vision (ECCV), 2022
Meng Cao
Tianyu Yang
Junwu Weng
Can Zhang
Jue Wang
Yuexian Zou
169
69
0
21 Jul 2022
GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
European Conference on Computer Vision (ECCV), 2022
Huseyin Coskun
Alireza Zareian
Joshua L. Moore
F. Tombari
Chen Wang
SSL
161
3
0
20 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
European Conference on Computer Vision (ECCV), 2022
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
183
32
0
20 Jul 2022
SVGraph: Learning Semantic Graphs from Instructional Videos
IEEE International Conference on Multimedia Big Data (ICMBD), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
189
5
0
16 Jul 2022
Visually-aware Acoustic Event Detection using Heterogeneous Graphs
Interspeech (Interspeech), 2022
A. Shirian
Krishna Somandepalli
Victor Sanchez
T. Guha
138
5
0
16 Jul 2022
Semi-Supervised Temporal Action Detection with Proposal-Free Masking
European Conference on Computer Vision (ECCV), 2022
Sauradip Nag
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
119
20
0
14 Jul 2022
Dual Contrastive Learning for Spatio-temporal Representation
ACM Multimedia (ACM MM), 2022
Shuangrui Ding
Rui Qian
H. Xiong
AI4TS
SSL
114
25
0
12 Jul 2022
Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization
IEEE transactions on multimedia (IEEE TMM), 2022
Jiashuo Yu
Junfu Pu
Ying Cheng
Rui Feng
Ying Shan
214
7
0
07 Jul 2022
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos
Computer Vision and Pattern Recognition (CVPR), 2022
S. H. Khorasgani
Yuxuan Chen
Florian Shkurti
SSL
166
29
0
25 Jun 2022
ProtoCLIP: Prototypical Contrastive Language Image Pretraining
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Delong Chen
Zhao Wu
Fan Liu
Zaiquan Yang
Huaxi Huang
Ying Tan
Erjin Zhou
VLM
CLIP
202
28
0
22 Jun 2022
Bi-Calibration Networks for Weakly-Supervised Video Representation Learning
International Journal of Computer Vision (IJCV), 2022
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
190
8
0
21 Jun 2022
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
188
34
0
20 Jun 2022
Self-Supervised Learning for Videos: A Survey
ACM Computing Surveys (ACM CSUR), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
394
163
0
18 Jun 2022
iBoot: Image-bootstrapped Self-Supervised Video Representation Learning
F. Saleh
Fuwen Tan
Adrian Bulat
Georgios Tzimiropoulos
Brais Martínez
SSL
215
1
0
16 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Neural Information Processing Systems (NeurIPS), 2022
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
260
113
0
16 Jun 2022
A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions
ACM Computing Surveys (ACM CSUR), 2022
Sheng Zhou
Hongjia Xu
Zhuonan Zheng
Jiawei Chen
Zhao Li
Jiajun Bu
Jia Wu
Xin Eric Wang
Wenwu Zhu
Martin Ester
204
145
0
15 Jun 2022
It's Time for Artistic Correspondence in Music and Video
Computer Vision and Pattern Recognition (CVPR), 2022
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
122
42
0
14 Jun 2022
Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence
Computer Vision and Pattern Recognition (CVPR), 2022
Mohammed Alloulah
Maximilian Arnold
SSL
276
2
0
13 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
Shohreh Deldari
Hao Xue
Aaqib Saeed
Jiayuan He
Daniel V. Smith
Flora D. Salim
AI4TS
191
43
0
06 Jun 2022
Noise-Tolerant Learning for Audio-Visual Action Recognition
IEEE transactions on multimedia (IEEE TMM), 2022
Haocheng Han
Qinghua Zheng
Minnan Luo
Kaiyao Miao
Feng Tian
Yuanchun Chen
NoLa
254
15
0
16 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
146
32
0
13 May 2022
AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote Work
AAAI Conference on Artificial Intelligence (AAAI), 2022
Pritam Sarkar
A. Posen
Ali Etemad
210
15
0
13 May 2022
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
Computer Vision and Pattern Recognition (CVPR), 2022
Haodong Duan
Nanxuan Zhao
Kai-xiang Chen
Dahua Lin
ViT
AI4TS
158
25
0
04 May 2022
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
Mahdi M. Kalayeh
Shervin Ardeshir
Lingyi Liu
Nagendra Kamath
Ashok Chandrashekar
SSL
115
3
0
29 Apr 2022
Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Boqing Zhu
Kele Xu
Changjian Wang
Zheng Qin
Tao Sun
Huaimin Wang
Yuxing Peng
SSL
164
22
0
28 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
European Conference on Computer Vision (ECCV), 2022
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
133
47
0
26 Apr 2022
Contrastive Language-Action Pre-training for Temporal Localization
Mengmeng Xu
Erhan Gundogdu
⋆⋆ Maksim
Guohao Li
M. Donoser
Loris Bazzani
150
26
0
26 Apr 2022
Adversarial Contrastive Learning by Permuting Cluster Assignments
Muntasir Wahed
Afrina Tabassum
Ismini Lourentzou
SSL
108
6
0
21 Apr 2022
A Survey of Video-based Action Quality Assessment
Shunli Wang
Dingkang Yang
Peng Zhai
Qing Yu
Tao Suo
Zhan Sun
Ka Li
Lihua Zhang
125
20
0
20 Apr 2022
Less than Few: Self-Shot Video Instance Segmentation
European Conference on Computer Vision (ECCV), 2022
Pengwan Yang
Yuki M. Asano
Pascal Mettes
Cees G. M. Snoek
SSL
140
2
0
19 Apr 2022
Rumor Detection with Self-supervised Learning on Texts and Social Graph
Yuan Gao
Xiang Wang
Xiangnan He
Huamin Feng
Yongdong Zhang
SSL
90
57
0
19 Apr 2022
SETTI: A Self-supervised Adversarial Malware Detection Architecture in an IoT Environment
Marjan Golmaryami
R. Taheri
Zahra Pooranian
Mohammad Shojafar
Pei Xiao
140
18
0
16 Apr 2022
How to Listen? Rethinking Visual Sound Localization
Interspeech (Interspeech), 2022
Ho-Hsiang Wu
Magdalena Fuentes
Prem Seetharaman
J. P. Bello
ObjD
90
5
0
11 Apr 2022
Frequency Selective Augmentation for Video Representation Learning
AAAI Conference on Artificial Intelligence (AAAI), 2022
Jinhyung Kim
Taeoh Kim
Minho Shim
Dongyoon Han
Dongyoon Wee
Junmo Kim
AI4TS
187
5
0
08 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
European Conference on Computer Vision (ECCV), 2022
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
262
52
0
06 Apr 2022
Controllable Augmentations for Video Representation Learning
Rui Qian
Weiyao Lin
John See
Dian Li
SSL
AI4TS
168
14
0
30 Mar 2022
Balanced Multimodal Learning via On-the-fly Gradient Modulation
Computer Vision and Pattern Recognition (CVPR), 2022
Xiaokang Peng
Yake Wei
Andong Deng
Dong Wang
Di Hu
233
322
0
29 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
European Conference on Computer Vision (ECCV), 2022
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
252
21
0
27 Mar 2022
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?
European Conference on Computer Vision (ECCV), 2022
Fida Mohammad Thoker
Hazel Doughty
Piyush Bagad
Cees G. M. Snoek
SSL
170
21
0
27 Mar 2022
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Zengjie Song
Yuxi Wang
Junsong Fan
Tieniu Tan
Zhaoxiang Zhang
SSL
143
47
0
25 Mar 2022
Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Rogerio Bonatti
LM&Ro
197
58
0
25 Mar 2022
Previous
1
2
3
4
5
6
Next