ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.16228
  4. Cited By
Self-Supervised MultiModal Versatile Networks

Self-Supervised MultiModal Versatile Networks

29 June 2020
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
    SSL
ArXivPDFHTML

Papers citing "Self-Supervised MultiModal Versatile Networks"

50 / 266 papers shown
Title
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual
  Event Localization and Video Parsing
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
19
52
0
24 Nov 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
10
870
0
22 Nov 2021
Self-Supervised Audio-Visual Representation Learning with Relaxed
  Cross-Modal Synchronicity
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
Pritam Sarkar
Ali Etemad
SSL
10
11
0
09 Nov 2021
Cascaded Multilingual Audio-Visual Learning from Videos
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David F. Harwath
Samuel Thomas
Hilde Kuehne
...
Rameswar Panda
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
27
7
0
08 Nov 2021
Masking Modalities for Cross-modal Video Retrieval
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
9
29
0
01 Nov 2021
Self-Supervised Representation Learning: Introduction, Advances and
  Challenges
Self-Supervised Representation Learning: Introduction, Advances and Challenges
Linus Ericsson
H. Gouk
Chen Change Loy
Timothy M. Hospedales
SSL
OOD
AI4TS
17
268
0
18 Oct 2021
Contrastive Learning of Visual-Semantic Embeddings
Contrastive Learning of Visual-Semantic Embeddings
Anurag Jain
Yashaswi Verma
SSL
17
1
0
17 Oct 2021
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Sangeeta Srivastava
Yun Wang
Andros Tjandra
Anurag Kumar
Chunxi Liu
Kritika Singh
Yatharth Saraf
SSL
17
24
0
14 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised
  Audiovisual Representation Learning
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
8
0
0
13 Oct 2021
Multi-Modal Pre-Training for Automated Speech Recognition
Multi-Modal Pre-Training for Automated Speech Recognition
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
14
16
0
12 Oct 2021
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual
  Representation Learning
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
Chongjian Ge
Youwei Liang
Yibing Song
Jianbo Jiao
Jue Wang
Ping Luo
ViT
8
36
0
11 Oct 2021
Stochastic Contrastive Learning
Stochastic Contrastive Learning
Jason Ramapuram
Dan Busbridge
Xavier Suau
Russ Webb
BDL
SSL
34
3
0
01 Oct 2021
Evaluating the fairness of fine-tuning strategies in self-supervised
  learning
Evaluating the fairness of fine-tuning strategies in self-supervised learning
Jason Ramapuram
Dan Busbridge
Russ Webb
17
6
0
01 Oct 2021
Do Self-Supervised and Supervised Methods Learn Similar Visual
  Representations?
Do Self-Supervised and Supervised Methods Learn Similar Visual Representations?
Tom George Grigg
Dan Busbridge
Jason Ramapuram
Russ Webb
SSL
DRL
9
27
0
01 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
242
554
0
28 Sep 2021
V-SlowFast Network for Efficient Visual Sound Separation
V-SlowFast Network for Efficient Visual Sound Separation
Lingyu Zhu
Esa Rahtu
28
10
0
18 Sep 2021
Multilingual Molecular Representation Learning via Contrastive
  Pre-training
Multilingual Molecular Representation Learning via Contrastive Pre-training
Zhihui Guo
P. Sharma
Andy Martinez
Liang Du
Robin Abraham
38
29
0
18 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
141
261
0
17 Sep 2021
Overview of Tencent Multi-modal Ads Video Understanding Challenge
Overview of Tencent Multi-modal Ads Video Understanding Challenge
Zhenzhi Wang
Liyu Wu
Zhimin Li
Jiangfeng Xiong
Qinglin Lu
11
4
0
16 Sep 2021
ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning
ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning
Zhiwu Qing
Ziyuan Huang
Shiwei Zhang
Mingqian Tang
Changxin Gao
M. Ang
Ronglei Ji
Nong Sang
20
3
0
24 Aug 2021
Learning to Cut by Watching Movies
Learning to Cut by Watching Movies
Alejandro Pardo
Fabian Caba Heilbron
Juan Carlos León Alcázar
Ali K. Thabet
Bernard Ghanem
VGen
43
20
0
09 Aug 2021
Video Contrastive Learning with Global Context
Video Contrastive Learning with Global Context
Haofei Kuang
Yi Zhu
Zhi-Li Zhang
Xinyu Li
Joseph Tighe
Sören Schwertfeger
C. Stachniss
Mu Li
SSL
AI4TS
11
56
0
05 Aug 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLM
VLM
GNN
6
561
0
30 Jul 2021
Contrastive Multimodal Fusion with TupleInfoNCE
Contrastive Multimodal Fusion with TupleInfoNCE
Yunze Liu
Qingnan Fan
Shanghang Zhang
Hao Dong
Thomas Funkhouser
Li Yi
21
64
0
06 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
23
443
0
30 Jun 2021
AudioCLIP: Extending CLIP to Image, Text and Audio
AudioCLIP: Extending CLIP to Image, Text and Audio
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
CLIP
VLM
14
266
0
24 Jun 2021
Spatio-Temporal SAR-Optical Data Fusion for Cloud Removal via a Deep
  Hierarchical Model
Spatio-Temporal SAR-Optical Data Fusion for Cloud Removal via a Deep Hierarchical Model
A. Sebastianelli
A. Nowakowski
Erika Puglisi
M. P. D. Rosso
J. Mifdal
F. Pirri
P. Mathieu
Silvia Liberata Ullo
3DPC
13
1
0
23 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
11
107
0
21 Jun 2021
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive
  Learning
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
Hao Tan
Jie Lei
Thomas Wolf
Mohit Bansal
18
60
0
21 Jun 2021
Watching Too Much Television is Good: Self-Supervised Audio-Visual
  Representation Learning from Movies and TV Shows
Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows
Mahdi M. Kalayeh
Nagendra Kamath
Lingyi Liu
Ashok Chandrashekar
SSL
11
2
0
16 Jun 2021
Cross-Modal Discrete Representation Learning
Cross-Modal Discrete Representation Learning
Alexander H. Liu
SouYoung Jin
Cheng-I Jeff Lai
Andrew Rouditchenko
A. Oliva
James R. Glass
SSL
11
36
0
10 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
11
372
0
04 Jun 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video
  Understanding
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
13
126
0
20 May 2021
Self-Supervised Learning from Automatically Separated Sound Scenes
Self-Supervised Learning from Automatically Separated Sound Scenes
Eduardo Fonseca
A. Jansen
D. Ellis
Scott Wisdom
Marco Tagliasacchi
J. Hershey
Manoj Plakal
Shawn Hershey
R. C. Moore
Xavier Serra
SSL
21
13
0
05 May 2021
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
Kirill Gavrilyuk
Mihir Jain
I. Karmanov
Cees G. M. Snoek
18
18
0
04 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
15
256
0
29 Apr 2021
Multimodal Contrastive Training for Visual Representation Learning
Multimodal Contrastive Training for Visual Representation Learning
Xin Yuan
Zhe-nan Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
20
125
0
26 Apr 2021
Multimodal Self-Supervised Learning of General Audio Representations
Multimodal Self-Supervised Learning of General Audio Representations
Luyu Wang
Pauline Luc
Adrià Recasens
Jean-Baptiste Alayrac
Aaron van den Oord
SSL
70
38
0
26 Apr 2021
Joint Representation Learning and Novel Category Discovery on Single-
  and Multi-modal Data
Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data
Xu Jia
Kai Han
Yukun Zhu
Bradley Green
132
52
0
26 Apr 2021
Multimodal Clustering Networks for Self-supervised Learning from
  Unlabeled Videos
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Brian Chen
Andrew Rouditchenko
Kevin Duarte
Hilde Kuehne
Samuel Thomas
...
Rogerio Feris
David F. Harwath
James R. Glass
M. Picheny
Shih-Fu Chang
SSL
30
89
0
26 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
229
573
0
22 Apr 2021
Does language help generalization in vision models?
Does language help generalization in vision models?
Benjamin Devillers
Bhavin Choksi
Romain Bielawski
R. V. Rullen
VLM
11
23
0
16 Apr 2021
Self-supervised object detection from audio-visual correspondence
Self-supervised object detection from audio-visual correspondence
Triantafyllos Afouras
Yuki M. Asano
Francois Fagan
Andrea Vedaldi
Florian Metze
SSL
8
39
0
13 Apr 2021
Object Priors for Classifying and Localizing Unseen Actions
Object Priors for Classifying and Localizing Unseen Actions
Pascal Mettes
William Thong
Cees G. M. Snoek
9
20
0
10 Apr 2021
Contrastive Learning of Global-Local Video Representations
Contrastive Learning of Global-Local Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
SSL
11
5
0
07 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
10
1,114
0
01 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
11
34
0
01 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language
  Representation Learning
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
10
7
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
15
127
0
30 Mar 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
  Transformers
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
ViT
12
135
0
30 Mar 2021
Previous
123456
Next