Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.16228
Cited By
Self-Supervised MultiModal Versatile Networks
29 June 2020
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Supervised MultiModal Versatile Networks"
50 / 266 papers shown
Title
Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback
Teresa Yeo
Oğuzhan Fatih Kar
Zahra Sodagar
Amir Zamir
TTA
OOD
13
3
0
27 Sep 2023
M
3
^{3}
3
3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
6
0
0
26 Sep 2023
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets
Daria Reshetova
Swetava Ganguli
C. V. K. Iyer
Vipul Pandey
10
3
0
26 Sep 2023
Video-adverb retrieval with compositional adverb-action embeddings
Thomas Hummel
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
12
1
0
26 Sep 2023
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
Meng Liu
K. Liang
Dayu Hu
Hao Yu
Yue Liu
Lingyuan Meng
Wenxuan Tu
Sihang Zhou
Xinwang Liu
16
24
0
21 Sep 2023
A Large-scale Dataset for Audio-Language Representation Learning
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
10
20
0
20 Sep 2023
AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder
Xingjian Diao
Ming Cheng
Shitong Cheng
VGen
11
8
0
15 Sep 2023
Preserving Modality Structure Improves Multi-Modal Learning
Swetha Sirnam
Mamshad Nayeem Rizve
Nina Shvetsova
Hilde Kuehne
M. Shah
15
4
0
24 Aug 2023
Opening the Vocabulary of Egocentric Actions
Dibyadip Chatterjee
Fadime Sener
Shugao Ma
Angela Yao
VLM
14
6
0
22 Aug 2023
MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization
Tao Chen
Zexiong Lin
Hui Li
Jiayi Ji
Yiyi Zhou
Guanbin Li
Rongrong Ji
11
0
0
22 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
17
1
0
20 Aug 2023
DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation
Qiaosong Qi
Le Zhuo
Aixi Zhang
Yue Liao
Fei Fang
Si Liu
Shuicheng Yan
11
22
0
05 Aug 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
14
19
0
27 Jul 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
31
12
0
24 Jul 2023
Learning to Count without Annotations
Lukas Knobel
Tengda Han
Yuki M. Asano
SSL
14
2
0
17 Jul 2023
Semi-supervised Multimodal Representation Learning through a Global Workspace
Benjamin Devillers
Léopold Maytié
R. V. Rullen
SSL
14
2
0
27 Jun 2023
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
8
2
0
21 Jun 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding
Zengjie Song
Zhaoxiang Zhang
11
1
0
19 Jun 2023
Language-Guided Music Recommendation for Video via Prompt Analogies
Daniel McKee
Justin Salamon
Josef Sivic
Bryan C. Russell
VGen
15
26
0
15 Jun 2023
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He
Yihe Deng
Shixiang Tang
Qihao Chen
Qingsong Xie
...
Feng Zhu
Rui Zhao
Wanli Ouyang
Donglian Qi
Yunfeng Yan
62
18
0
13 Jun 2023
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Paul Pu Liang
Zihao Deng
Martin Q. Ma
James Y. Zou
Louis-Philippe Morency
Ruslan Salakhutdinov
SSL
16
49
0
08 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
35
0
0
04 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
8
0
0
02 Jun 2023
LANISTR: Multimodal Learning from Structured and Unstructured Data
Sayna Ebrahimi
Sercan Ö. Arik
Yihe Dong
Tomas Pfister
12
4
0
26 May 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
25
21
0
25 May 2023
PandaGPT: One Model To Instruction-Follow Them All
Yixuan Su
Tian Lan
Huayang Li
Jialu Xu
Yan Wang
Deng Cai
MLLM
29
269
0
25 May 2023
Label-Efficient Learning in Agriculture: A Comprehensive Review
Jiajia Li
Dong Chen
Xinda Qi
Zhao Li
Yanbo Huang
Daniel Morris
Xiaobo Tan
23
33
0
24 May 2023
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Viorica Puatruaucean
Lucas Smaira
Ankush Gupta
Adrià Recasens Continente
L. Markeeva
...
Y. Aytar
Simon Osindero
Dima Damen
Andrew Zisserman
João Carreira
VLM
107
138
0
23 May 2023
Few-Shot Learning with Visual Distribution Calibration and Cross-Modal Distribution Alignment
Runqi Wang
Hao Zheng
Xiaoyue Duan
Jianzhuang Liu
Yuning Lu
Tian Wang
Songcen Xu
Baochang Zhang
VLM
16
12
0
19 May 2023
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
Zhaoyang Zhang
Yantao Shen
Kunyu Shi
Zhaowei Cai
Jun Fang
Siqi Deng
Hao-Yu Yang
Davide Modolo
Z. Tu
Stefano Soatto
VLM
17
2
0
11 May 2023
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
16
817
0
09 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
6
8
0
06 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
52
6
0
05 May 2023
Learning Missing Modal Electronic Health Records with Unified Multi-modal Data Embedding and Modality-Aware Attention
Kwanhyung Lee
Soojeong Lee
Sangchul Hahn
Heejung Hyun
E. Choi
Byungeun Ahn
Joohyung Lee
41
8
0
04 May 2023
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Chuhan Zhang
Antoine Miech
Jiajun Shen
Jean-Baptiste Alayrac
Pauline Luc
VLM
VPVLM
26
2
0
03 May 2023
Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning
Hao Zhang
Nianwen Si
Yaqi Chen
Wenlin Zhang
Xukui Yang
Dan Qu
Weiqiang Zhang
17
9
0
20 Apr 2023
Multimodal Representation Learning of Cardiovascular Magnetic Resonance Imaging
Jielin Qiu
Peide Huang
Makiya Nakashima
Jae-Hyeok Lee
Jiacheng Zhu
...
Byung-Hak Kim
Debbie Kwon
Douglas Weber
Ding Zhao
David Chen
SSL
6
4
0
16 Apr 2023
On Robustness in Multimodal Learning
Brandon McKinzie
Joseph Cheng
Vaishaal Shankar
Yinfei Yang
Jonathon Shlens
Alexander Toshev
17
2
0
10 Apr 2023
Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space
Yuwei Sun
H. Ochiai
Jun Sakuma
AAML
10
4
0
02 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
10
38
0
31 Mar 2023
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
11
23
0
31 Mar 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
16
7
0
29 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
24
322
0
29 Mar 2023
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding
Yuanhao Xiong
Long Zhao
Boqing Gong
Ming-Hsuan Yang
Florian Schroff
Ting Liu
Cho-Jui Hsieh
Liangzhe Yuan
VLM
11
0
0
28 Mar 2023
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
6
15
0
28 Mar 2023
The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery
David A. Noever
S. M. Noever
16
6
0
20 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
58
14
0
14 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
14
10
0
12 Mar 2023
Heterogeneous Graph Learning for Acoustic Event Classification
A. Shirian
Mona Ahmadian
Krishna Somandepalli
T. Guha
25
2
0
05 Mar 2023
Cross-modal Face- and Voice-style Transfer
Naoya Takahashi
M. Singh
Yuki Mitsufuji
CVBM
39
2
0
27 Feb 2023
Previous
1
2
3
4
5
6
Next