ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.16228
  4. Cited By
Self-Supervised MultiModal Versatile Networks

Self-Supervised MultiModal Versatile Networks

29 June 2020
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
    SSL
ArXivPDFHTML

Papers citing "Self-Supervised MultiModal Versatile Networks"

50 / 266 papers shown
Title
Intra-agent speech permits zero-shot task acquisition
Intra-agent speech permits zero-shot task acquisition
Chen Yan
Federico Carnevale
Petko Georgiev
Adam Santoro
Aurelia Guy
Alistair Muldal
Chia-Chun Hung
Josh Abramson
Timothy Lillicrap
Greg Wayne
LM&Ro
26
9
0
07 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning
  on Multimodal and Temporal Data
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
Shohreh Deldari
Hao Xue
Aaqib Saeed
Jiayuan He
Daniel V. Smith
Flora D. Salim
AI4TS
17
37
0
06 Jun 2022
3D-Augmented Contrastive Knowledge Distillation for Image-based Object
  Pose Estimation
3D-Augmented Contrastive Knowledge Distillation for Image-based Object Pose Estimation
Zhidan Liu
Zhen Xing
Xiangdong Zhou
Yijiang Chen
G. Zhou
3DH
8
3
0
02 Jun 2022
Language Models with Image Descriptors are Strong Few-Shot
  Video-Language Learners
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Mohit Bansal
Heng Ji
MLLM
VLM
159
134
0
22 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
30
29
0
13 May 2022
Scene Consistency Representation Learning for Video Scene Segmentation
Scene Consistency Representation Learning for Video Scene Segmentation
Haoqian Wu
Keyu Chen
Yanan Luo
Ruizhi Qiao
Bo Ren
Haozhe Liu
Weicheng Xie
Linlin Shen
SSL
12
16
0
11 May 2022
TransRank: Self-supervised Video Representation Learning via
  Ranking-based Transformation Recognition
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
Haodong Duan
Nanxuan Zhao
Kai-xiang Chen
Dahua Lin
ViT
AI4TS
28
19
0
04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
29
45
0
03 May 2022
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
Mahdi M. Kalayeh
Shervin Ardeshir
Lingyi Liu
Nagendra Kamath
Ashok Chandrashekar
SSL
14
3
0
29 Apr 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
15
2,308
0
29 Apr 2022
Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype
  Contrast
Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast
Boqing Zhu
Kele Xu
Changjian Wang
Zheng Qin
Tao Sun
Huaimin Wang
Yuxing Peng
SSL
20
17
0
28 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for
  Video-text Retrieval
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
6
43
0
26 Apr 2022
A Survey of Video-based Action Quality Assessment
A Survey of Video-based Action Quality Assessment
Shunli Wang
Dingkang Yang
Peng Zhai
Qing Yu
Tao Suo
Zhan Sun
Ka Li
Lihua Zhang
13
16
0
20 Apr 2022
Frequency Selective Augmentation for Video Representation Learning
Frequency Selective Augmentation for Video Representation Learning
Jinhyung Kim
Taeoh Kim
Minho Shim
Dongyoon Han
Dongyoon Wee
Junmo Kim
AI4TS
17
3
0
08 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Mohit Bansal
Gedas Bertasius
18
39
0
06 Apr 2022
MultiMAE: Multi-modal Multi-task Masked Autoencoders
MultiMAE: Multi-modal Multi-task Masked Autoencoders
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
14
262
0
04 Apr 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
M. Volkovs
Animesh Garg
Guangwei Yu
12
148
0
28 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
14
16
0
27 Mar 2022
Versatile Multi-Modal Pre-Training for Human-Centric Perception
Versatile Multi-Modal Pre-Training for Human-Centric Perception
Fangzhou Hong
Liang Pan
Zhongang Cai
Ziwei Liu
VLM
17
24
0
25 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
20
768
0
23 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions
  from Untrimmed Web Videos
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
8
32
0
22 Mar 2022
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via
  Cross-modal Distillation
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-modal Distillation
Antonín Vobecký
David Hurych
Oriane Siméoni
Spyros Gidaris
Andrei Bursuc
Patrick Pérez
Josef Sivic
3DPC
17
20
0
21 Mar 2022
A Study on Robustness to Perturbations for Representations of
  Environmental Sound
A Study on Robustness to Perturbations for Representations of Environmental Sound
Sangeeta Srivastava
Ho-Hsiang Wu
Joao Rulff
Magdalena Fuentes
M. Cartwright
Claudio Silva
Anish Arora
J. P. Bello
15
3
0
20 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition
  on Modality-Specific Annotated Videos
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
Saghir Alfasly
Jian Lu
C. Xu
Yuru Zou
16
18
0
06 Mar 2022
COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems
COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems
Shuang Ma
Sai H. Vemprala
Wenshan Wang
Jayesh K. Gupta
Yale Song
Daniel J. McDuff
Ashish Kapoor
SSL
8
9
0
20 Feb 2022
Misinformation Detection in Social Media Video Posts
Kehan Wang
David M. Chan
Seth Z. Zhao
John F. Canny
A. Zakhor
12
7
0
15 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
19
823
0
07 Feb 2022
Keyword localisation in untranscribed speech using visually grounded
  speech models
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
11
7
0
02 Feb 2022
From data to functa: Your data point is a function and you can treat it
  like one
From data to functa: Your data point is a function and you can treat it like one
Emilien Dupont
Hyunjik Kim
S. M. Ali Eslami
Danilo Jimenez Rezende
Dan Rosenbaum
TDI
3DPC
157
136
0
28 Jan 2022
Learning To Recognize Procedural Activities with Distant Supervision
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
14
82
0
26 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
209
222
0
20 Jan 2022
TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
Yue Ruan
Han-Hung Lee
Yiming Zhang
Ke Zhang
Angel X. Chang
10
12
0
19 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
101
0
16 Jan 2022
Tailor Versatile Multi-modal Learning for Multi-label Emotion
  Recognition
Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition
Yi Zhang
Mingyuan Chen
Jundong Shen
Chongjun Wang
10
58
0
15 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
8
108
0
13 Jan 2022
Multi-Query Video Retrieval
Multi-Query Video Retrieval
Zeyu Wang
Yu Wu
Karthik Narasimhan
Olga Russakovsky
20
17
0
10 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
9
175
0
07 Jan 2022
Progressive Video Summarization via Multimodal Self-supervised Learning
Progressive Video Summarization via Multimodal Self-supervised Learning
Haopeng Li
Qiuhong Ke
Mingming Gong
Tom Drummond
AI4TS
23
17
0
07 Jan 2022
Fine-grained Multi-Modal Self-Supervised Learning
Fine-grained Multi-Modal Self-Supervised Learning
Duo Wang
S. Karout
SSL
17
7
0
22 Dec 2021
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
13
41
0
22 Dec 2021
Connecting the Dots between Audio and Text without Parallel Data through
  Visual Knowledge Transfer
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer
Yanpeng Zhao
Jack Hessel
Youngjae Yu
Ximing Lu
Rowan Zellers
Yejin Choi
9
27
0
16 Dec 2021
Multimodal neural networks better explain multivoxel patterns in the
  hippocampus
Multimodal neural networks better explain multivoxel patterns in the hippocampus
Bhavin Choksi
Milad Mozafari
Rufin VanRullen
Leila Reddy
11
12
0
11 Dec 2021
Contextualized Spatio-Temporal Contrastive Learning with
  Self-Supervision
Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision
Liangzhe Yuan
Rui Qian
Yin Cui
Boqing Gong
Florian Schroff
Ming-Hsuan Yang
Hartwig Adam
Ting Liu
AI4TS
14
15
0
09 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge J. Belongie
Ming-Hsuan Yang
Hartwig Adam
Yin Cui
AI4TS
33
6
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David F. Harwath
James R. Glass
Hilde Kuehne
ViT
15
127
0
08 Dec 2021
Creating Multimodal Interactive Agents with Imitation and
  Self-Supervised Learning
Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning
DeepMind Interactive Agents Team Josh Abramson
Josh Abramson
Arun Ahuja
Arthur Brussee
Federico Carnevale
...
Tamara von Glehn
Greg Wayne
Nathaniel Wong
Chen Yan
Rui Zhu
LM&Ro
24
46
0
07 Dec 2021
Routing with Self-Attention for Multimodal Capsule Networks
Routing with Self-Attention for Multimodal Capsule Networks
Kevin Duarte
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
Samuel Thomas
Alexander H. Liu
David F. Harwath
James R. Glass
Hilde Kuehne
M. Shah
SSL
21
5
0
01 Dec 2021
Sound-Guided Semantic Image Manipulation
Sound-Guided Semantic Image Manipulation
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
8
43
0
30 Nov 2021
ContIG: Self-supervised Multimodal Contrastive Learning for Medical
  Imaging with Genetics
ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with Genetics
Aiham Taleb
Matthias Kirchler
Remo Monti
C. Lippert
SSL
MedIm
20
54
0
26 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
23
73
0
25 Nov 2021
Previous
123456
Next