ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.12667
  4. Cited By
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
v1v2v3 (latest)

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Neural Information Processing Systems (NeurIPS), 2019
28 November 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
    SSL
ArXiv (abs)PDFHTML

Papers citing "Self-Supervised Learning by Cross-Modal Audio-Video Clustering"

50 / 280 papers shown
Title
Learning to Cut by Watching Movies
Learning to Cut by Watching MoviesIEEE International Conference on Computer Vision (ICCV), 2021
Alejandro Pardo
Fabian Caba Heilbron
Juan Carlos León Alcázar
Ali K. Thabet
Guohao Li
VGen
285
23
0
09 Aug 2021
Video Contrastive Learning with Global Context
Video Contrastive Learning with Global Context
Haofei Kuang
Yi Zhu
Zhi-Li Zhang
Xinyu Li
Joseph Tighe
Sören Schwertfeger
C. Stachniss
Mu Li
SSLAI4TS
203
64
0
05 Aug 2021
Self-supervised Learning with Local Attention-Aware Feature
Self-supervised Learning with Local Attention-Aware Feature
T. Pham
R. Mina
Dias Issa
Chang D. Yoo
165
6
0
01 Aug 2021
Self-Supervised Multi-Modal Alignment for Whole Body Medical Imaging
Self-Supervised Multi-Modal Alignment for Whole Body Medical ImagingInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2021
Rhydian Windsor
A. Jamaludin
T. Kadir
Andrew Zisserman
190
18
0
14 Jul 2021
Multi-level Feature Learning for Contrastive Multi-view Clustering
Multi-level Feature Learning for Contrastive Multi-view ClusteringComputer Vision and Pattern Recognition (CVPR), 2021
Jie Xu
Huayi Tang
Yazhou Ren
Liang Peng
Xiao-lan Zhu
Lifang He
146
261
0
21 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream
  Prototypical Contrasting
Self-supervised Video Representation Learning with Cross-Stream Prototypical ContrastingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
216
12
0
18 Jun 2021
MaCLR: Motion-aware Contrastive Learning of Representations for Videos
MaCLR: Motion-aware Contrastive Learning of Representations for VideosEuropean Conference on Computer Vision (ECCV), 2021
Fanyi Xiao
Joseph Tighe
Davide Modolo
SSL
147
18
0
17 Jun 2021
LiRA: Learning Visual Speech Representations from Audio through
  Self-supervision
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
Maja Pantic
SSL
132
58
0
16 Jun 2021
Watching Too Much Television is Good: Self-Supervised Audio-Visual
  Representation Learning from Movies and TV Shows
Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows
Mahdi M. Kalayeh
Nagendra Kamath
Lingyi Liu
Ashok Chandrashekar
SSL
93
3
0
16 Jun 2021
Learning Audio-Visual Dereverberation
Learning Audio-Visual DereverberationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Changan Chen
Wei-Ju Sun
David Harwath
Kristen Grauman
171
35
0
14 Jun 2021
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
Shaobo Min
Jingdong Sun
Hongtao Xie
Chuang Gan
Yongdong Zhang
Jingdong Wang
SSL
95
7
0
13 Jun 2021
Signal Transformer: Complex-valued Attention and Meta-Learning for
  Signal Recognition
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
335
12
0
05 Jun 2021
WiCluster: Passive Indoor 2D/3D Positioning using WiFi without Precise
  Labels
WiCluster: Passive Indoor 2D/3D Positioning using WiFi without Precise LabelsGlobal Communications Conference (GLOBECOM), 2021
I. Karmanov
F. G. Zanjani
S. Merlin
I. Kadampot
Daniel Dijkman
144
19
0
31 May 2021
Divide and Contrast: Self-supervised Learning from Uncurated Data
Divide and Contrast: Self-supervised Learning from Uncurated DataIEEE International Conference on Computer Vision (ICCV), 2021
Yonglong Tian
Olivier J. Hénaff
Aaron van den Oord
SSL
264
110
0
17 May 2021
Contrastive Learning of Image Representations with Cross-Video
  Cycle-Consistency
Contrastive Learning of Image Representations with Cross-Video Cycle-ConsistencyIEEE International Conference on Computer Vision (ICCV), 2021
Haiping Wu
Xiaolong Wang
SSL
116
32
0
13 May 2021
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
Motion-Augmented Self-Training for Video Recognition at Smaller ScaleIEEE International Conference on Computer Vision (ICCV), 2021
Kirill Gavrilyuk
Mihir Jain
I. Karmanov
Cees G. M. Snoek
123
24
0
04 May 2021
CoCon: Cooperative-Contrastive Learning
CoCon: Cooperative-Contrastive Learning
Nishant Rai
Ehsan Adeli
Kuan-Hui Lee
Adrien Gaidon
Juan Carlos Niebles
SSL
123
19
0
30 Apr 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation LearningComputer Vision and Pattern Recognition (CVPR), 2021
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSLAI4TS
233
282
0
29 Apr 2021
Multimodal Contrastive Training for Visual Representation Learning
Multimodal Contrastive Training for Visual Representation LearningComputer Vision and Pattern Recognition (CVPR), 2021
Xin Yuan
Zhe Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
197
187
0
26 Apr 2021
Joint Representation Learning and Novel Category Discovery on Single-
  and Multi-modal Data
Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal DataIEEE International Conference on Computer Vision (ICCV), 2021
Xu Jia
Kai Han
Yukun Zhu
Bradley Green
318
73
0
26 Apr 2021
Multimodal Clustering Networks for Self-supervised Learning from
  Unlabeled Videos
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled VideosIEEE International Conference on Computer Vision (ICCV), 2021
Brian Chen
Andrew Rouditchenko
Kevin Duarte
Hilde Kuehne
Samuel Thomas
...
Rogerio Feris
David Harwath
James R. Glass
M. Picheny
Shih-Fu Chang
SSL
372
96
0
26 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextNeural Information Processing Systems (NeurIPS), 2021
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
587
669
0
22 Apr 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Distilling Audio-Visual Knowledge by Compositional Contrastive LearningComputer Vision and Pattern Recognition (CVPR), 2021
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
228
95
0
22 Apr 2021
Self-supervised object detection from audio-visual correspondence
Self-supervised object detection from audio-visual correspondenceComputer Vision and Pattern Recognition (CVPR), 2021
Triantafyllos Afouras
Yuki M. Asano
Francois Fagan
Andrea Vedaldi
Florian Metze
SSL
287
52
0
13 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural AudiosComputer Vision and Pattern Recognition (CVPR), 2021
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
83
67
0
13 Apr 2021
Contrastive Learning of Global-Local Video Representations
Contrastive Learning of Global-Local Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
SSL
144
7
0
07 Apr 2021
Cross-Modal learning for Audio-Visual Video Parsing
Cross-Modal learning for Audio-Visual Video ParsingInterspeech (Interspeech), 2021
Jatin Lamba
Abhishek
Jayaprakash Akula
Rishabh Dabral
Preethi Jyothi
Ganesh Ramakrishnan
228
8
0
03 Apr 2021
Self-supervised Video Representation Learning by Context and Motion
  Decoupling
Self-supervised Video Representation Learning by Context and Motion DecouplingComputer Vision and Pattern Recognition (CVPR), 2021
Lianghua Huang
Yu Liu
Bin Wang
Pan Pan
Yinghui Xu
Rong Jin
SSL
204
54
0
02 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Multiview Pseudo-Labeling for Semi-supervised Learning from VideoIEEE International Conference on Computer Vision (ICCV), 2021
Bo Xiong
Haoqi Fan
Kristen Grauman
Christoph Feichtenhofer
SSL
177
54
0
01 Apr 2021
Composable Augmentation Encoding for Video Representation Learning
Composable Augmentation Encoding for Video Representation LearningIEEE International Conference on Computer Vision (ICCV), 2021
Chen Sun
Arsha Nagrani
Yonglong Tian
Cordelia Schmid
SSLAI4TS
204
19
0
01 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Unsupervised Sound Localization via Iterative Contrastive LearningComputer Vision and Image Understanding (CVIU), 2021
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
154
40
0
01 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
  Representation Learning
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
161
7
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video LearningIEEE International Conference on Computer Vision (ICCV), 2021
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSLAI4TS
266
138
0
30 Mar 2021
Robust Audio-Visual Instance Discrimination
Robust Audio-Visual Instance DiscriminationComputer Vision and Pattern Recognition (CVPR), 2021
Pedro Morgado
Ishan Misra
Nuno Vasconcelos
SSL
212
117
0
29 Mar 2021
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action
  Localization
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action LocalizationNeural Information Processing Systems (NeurIPS), 2021
Mengmeng Xu
Juan-Manuel Perez-Rua
Xiatian Zhu
Guohao Li
Brais Martinez
252
30
0
28 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation LearningIEEE International Conference on Computer Vision (ICCV), 2021
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
251
35
0
18 Mar 2021
Multi-Format Contrastive Learning of Audio Representations
Multi-Format Contrastive Learning of Audio Representations
Luyu Wang
Aaron van den Oord
151
63
0
11 Mar 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual
  Video Representation Learning
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation LearningIEEE International Conference on Computer Vision (ICCV), 2021
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
309
62
0
26 Jan 2021
TCLR: Temporal Contrastive Learning for Video Representation
TCLR: Temporal Contrastive Learning for Video RepresentationComputer Vision and Image Understanding (CVIU), 2021
I. Dave
Rohit Gupta
Mamshad Nayeem Rizve
Mubarak Shah
SSLAI4TS
320
203
0
20 Jan 2021
Cross-modal Learning for Domain Adaptation in 3D Semantic Segmentation
Cross-modal Learning for Domain Adaptation in 3D Semantic SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
M. Jaritz
Tuan-Hung Vu
Raoul de Charette
É. Wirbel
P. Pérez
3DPC
158
66
0
18 Jan 2021
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action
  Recognition
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action RecognitionEuropean Conference on Computer Vision (ECCV), 2021
Shreyank N. Gowda
Laura Sevilla-Lara
Frank Keller
Marcus Rohrbach
VLM
224
27
0
18 Jan 2021
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts
Learning from Weakly-labeled Web Videos via Exploring Sub-ConceptsAAAI Conference on Artificial Intelligence (AAAI), 2021
Kunpeng Li
Zizhao Zhang
Guanhang Wu
Xuehan Xiong
Chen-Yu Lee
Zhichao Lu
Y. Fu
Tomas Pfister
130
5
0
11 Jan 2021
Human Action Recognition from Various Data Modalities: A Review
Human Action Recognition from Various Data Modalities: A ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
504
673
0
22 Dec 2020
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLMAI4TS
220
207
0
11 Dec 2020
Parameter Efficient Multimodal Transformers for Video Representation
  Learning
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee
Youngjae Yu
Gunhee Kim
Thomas Breuel
Jan Kautz
Yale Song
ViT
198
87
0
08 Dec 2020
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization
  Tasks
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
197
138
0
23 Nov 2020
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised
  Video Representation Learning
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Zehua Zhang
David J. Crandall
AI4TSSSL
198
25
0
23 Nov 2020
Boundary-sensitive Pre-training for Temporal Localization in Videos
Boundary-sensitive Pre-training for Temporal Localization in VideosIEEE International Conference on Computer Vision (ICCV), 2020
Mengmeng Xu
Juan-Manuel Perez-Rua
Victor Escorcia
Brais Martínez
Xiatian Zhu
Li Zhang
Guohao Li
Tao Xiang
178
63
0
21 Nov 2020
Audio-Visual Event Recognition through the lens of Adversary
Audio-Visual Event Recognition through the lens of AdversaryIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Juncheng Li
Kaixin Ma
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
AAML
133
9
0
15 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
133
138
0
03 Nov 2020
Previous
123456
Next