Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1911.12667
Cited By
v1
v2
v3 (latest)
Self-Supervised Learning by Cross-Modal Audio-Video Clustering
Neural Information Processing Systems (NeurIPS), 2019
28 November 2019
Humam Alwassel
D. Mahajan
Bruno Korbar
Lorenzo Torresani
Guohao Li
Du Tran
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-Supervised Learning by Cross-Modal Audio-Video Clustering"
50 / 280 papers shown
Title
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
280
21
0
19 Sep 2023
Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Shaofei Huang
Han Li
Yuqing Wang
Hongji Zhu
Jiao Dai
Jizhong Han
Wenge Rong
Si Liu
VOS
117
29
0
18 Sep 2023
Self-supervised Multi-view Clustering in Computer Vision: A Survey
IET Computer Vision (ICV), 2023
Jiatai Wang
Zhiwei Xu
Xuewen Yang
Hailong Li
Bo Li
Xuying Meng
192
5
0
18 Sep 2023
AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder
IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2023
Xingjian Diao
Ming Cheng
Shitong Cheng
VGen
220
11
0
15 Sep 2023
Text-to-feature diffusion for audio-visual few-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
VLM
158
3
0
07 Sep 2023
Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning
ACM Multimedia (ACM MM), 2023
Minghao Zhu
Xiao Lin
Ronghao Dang
Chengju Liu
Qi Chen
VGen
173
11
0
01 Sep 2023
Self-Supervised Representation Learning with Cross-Context Learning between Global and Hypercolumn Features
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Zheng Gao
Chen Feng
Ioannis Patras
SSL
195
6
0
25 Aug 2023
Preserving Modality Structure Improves Multi-Modal Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Swetha Sirnam
Mamshad Nayeem Rizve
Nina Shvetsova
Hilde Kuehne
M. Shah
172
13
0
24 Aug 2023
Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos
IEEE International Conference on Computer Vision (ICCV), 2023
Rui Qian
Shuangrui Ding
Xian Liu
Dahua Lin
338
21
0
19 Aug 2023
Query-based Video Summarization with Pseudo Label Supervision
International Conference on Information Photonics (ICIP), 2023
Jia-Hong Huang
L. Murn
M. Mrak
Marcel Worring
185
13
0
04 Jul 2023
A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning
Hui Xiong
Hongwei Dong
Jingyao Wang
J. Yu
Wen-jie Zhai
Changwen Zheng
Jianwei Niu
Gang Hua
165
1
0
28 Jun 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Zengjie Song
Zhaoxiang Zhang
147
5
0
19 Jun 2023
A Large-Scale Analysis on Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Yogesh S Rawat
SSL
229
3
0
09 Jun 2023
HomE: Homography-Equivariant Video Representation Learning
Anirudh Sriram
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
L. Fei-Fei
Ehsan Adeli
SSL
AI4TS
135
2
0
02 Jun 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Neural Information Processing Systems (NeurIPS), 2023
Yingying Fan
Yu Wu
Bo Du
Yutian Lin
237
17
0
01 Jun 2023
LANISTR: Multimodal Learning from Structured and Unstructured Data
Sayna Ebrahimi
Sercan O. Arik
Yihe Dong
Tomas Pfister
184
7
0
26 May 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
208
9
0
25 May 2023
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
351
355
0
24 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
350
3
0
12 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
263
53
0
31 Mar 2023
Self-Supervised Multimodal Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
275
79
0
31 Mar 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Computer Vision and Pattern Recognition (CVPR), 2023
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffM
VGen
164
51
0
30 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Computer Vision and Pattern Recognition (CVPR), 2023
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
291
503
0
29 Mar 2023
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
IEEE transactions on multimedia (IEEE TMM), 2023
Jiawei Liu
Weining Wang
Sihan Chen
Xinxin Zhu
Qingbin Liu
DiffM
VGen
130
18
0
29 Mar 2023
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding
International Conference on Learning Representations (ICLR), 2023
Yuanhao Xiong
Long Zhao
Boqing Gong
Ming-Hsuan Yang
Florian Schroff
Ting Liu
Cho-Jui Hsieh
Liangzhe Yuan
VLM
173
0
0
28 Mar 2023
Egocentric Auditory Attention Localization in Conversations
Computer Vision and Pattern Recognition (CVPR), 2023
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
211
23
0
28 Mar 2023
Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
IEEE International Conference on Computer Vision (ICCV), 2023
Fida Mohammad Thoker
Hazel Doughty
Cees G. M. Snoek
ViT
268
12
0
20 Mar 2023
Audio-Visual Contrastive Learning with Temporal Self-Supervision
AAAI Conference on Artificial Intelligence (AAAI), 2023
Simon Jenni
Alexander Black
John Collomosse
SSL
170
23
0
15 Feb 2023
SemanticAC: Semantics-Assisted Framework for Audio Classification
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yicheng Xiao
Yue Ma
Shuyan Li
Hantao Zhou
Ran Liao
Xiu Li
89
11
0
12 Feb 2023
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
179
23
0
23 Jan 2023
Novel-View Acoustic Synthesis
Computer Vision and Pattern Recognition (CVPR), 2023
Changan Chen
Alexander Richard
Roman Shapovalov
V. Ithapu
Natalia Neverova
Kristen Grauman
Andrea Vedaldi
194
45
0
20 Jan 2023
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2023
Zhiqiu Lin
Samuel Yu
Zhiyi Kuang
Deepak Pathak
Deva Ramana
VLM
359
147
0
16 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
311
4
0
05 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Guohao Li
AAML
238
12
0
03 Jan 2023
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
Machine Vision and Applications (MVA), 2022
J. Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
SSL
238
6
0
21 Dec 2022
C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Dipika Singhania
R. Rahaman
Angela Yao
175
43
0
20 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Computer Vision and Pattern Recognition (CVPR), 2022
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
256
106
0
15 Dec 2022
Jointly Learning Visual and Auditory Speech Representations from Raw Data
International Conference on Learning Representations (ICLR), 2022
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
249
68
0
12 Dec 2022
Audiovisual Masked Autoencoders
IEEE International Conference on Computer Vision (ICCV), 2022
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
270
56
0
09 Dec 2022
Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Jing-Xuan Zhang
Genshun Wan
Zhenhua Ling
Jia Pan
Jianqing Gao
Cong Liu
SSL
196
15
0
06 Dec 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
AAAI Conference on Artificial Intelligence (AAAI), 2022
Pritam Sarkar
Ali Etemad
263
34
0
25 Nov 2022
A Survey of Deep Graph Clustering: Taxonomy, Challenge, Application, and Open Resource
Yue Liu
Jun Xia
Sihang Zhou
Xihong Yang
K. Liang
Chenchen Fan
Zhuang Yan
Stan Z. Li
Xinwang Liu
Kunlun He
OOD
224
36
0
23 Nov 2022
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal Retrieval
IEEE International Symposium on Multimedia (ISM), 2022
Donghuo Zeng
Yanan Wang
Jianming Wu
K. Ikeda
166
5
0
07 Nov 2022
Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
Neural Information Processing Systems (NeurIPS), 2022
Junru Wu
Yi Liang
Feng Han
Hassan Akbari
Zinan Lin
Cong Yu
131
14
0
03 Nov 2022
On the Role of Visual Context in Enriching Music Representations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Kleanthis Avramidis
Shanti Stewart
Shrikanth Narayanan
131
4
0
28 Oct 2022
VTC: Improving Video-Text Retrieval with User Comments
European Conference on Computer Vision (ECCV), 2022
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
175
8
0
19 Oct 2022
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
301
53
0
13 Oct 2022
Masked Motion Encoding for Self-Supervised Video Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Xinyu Sun
Peihao Chen
Liang-Chieh Chen
Chan Li
Thomas H. Li
Zhuliang Yu
Chuang Gan
242
42
0
12 Oct 2022
Match Cutting: Finding Cuts with Smooth Visual Transitions
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Boris Chen
Amir Ziai
Rebecca Tucker
Yuchen Xie
VGen
240
17
0
11 Oct 2022
Turbo Training with Token Dropout
British Machine Vision Conference (BMVC), 2022
Tengda Han
Weidi Xie
Andrew Zisserman
ViT
172
14
0
10 Oct 2022
Previous
1
2
3
4
5
6
Next