Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.16228
Cited By
Self-Supervised MultiModal Versatile Networks
29 June 2020
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Supervised MultiModal Versatile Networks"
50 / 266 papers shown
Title
Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining
Weizhen He
Yunfeng Yan
Shixiang Tang
Yiheng Deng
Yangyang Zhong
Pengxin Luo
Donglian Qi
VLM
83
1
0
29 Apr 2025
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Yichao Cai
Yuhang Liu
Erdun Gao
T. Jiang
Zhen Zhang
Anton van den Hengel
J. Shi
55
0
0
14 Apr 2025
CIBR: Cross-modal Information Bottleneck Regularization for Robust CLIP Generalization
Yingrui Ji
Xi Xiao
Gaofei Chen
Hao Xu
Chenrui Ma
Lijing Zhu
Aokun Liang
Jiansheng Chen
VLM
43
0
0
31 Mar 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
48
0
0
30 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&Ro
VLM
72
4
0
05 Mar 2025
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
67
0
0
20 Feb 2025
MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field
Zijian Győző Yang
Zhongwei Qiu
Chang Xu
Dongmei Fu
43
2
0
28 Jan 2025
Learning the Language of Protein Structure
Benoit Gaujac
Jérémie Donà
Liviu Copoiu
Timothy Atkinson
Thomas Pierrot
Thomas D. Barrett
46
10
0
08 Jan 2025
Explorations in Self-Supervised Learning: Dataset Composition Testing for Object Classification
Raynor Kirkson E. Chavez
Kyle Gabriel M. Reynoso
66
0
0
01 Dec 2024
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
Joseph Heyward
João Carreira
Dima Damen
Andrew Zisserman
Viorica Patraucean
66
2
0
29 Nov 2024
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Yuhang Yang
Jinhong Deng
Wen Li
Lixin Duan
VLM
66
0
0
24 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
68
0
0
24 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
59
0
0
18 Nov 2024
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
A. Saporta
A. Puli
Mark Goldstein
Rajesh Ranganath
SSL
21
0
0
01 Nov 2024
Survival Prediction in Lung Cancer through Multi-Modal Representation Learning
Aiman Farooq
Deepak Mishra
S. Chaudhury
21
0
0
30 Sep 2024
What to align in multimodal contrastive learning?
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
19
3
0
11 Sep 2024
PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation
Ginger Delmas
Philippe Weinzaepfel
Francesc Moreno-Noguer
Grégory Rogez
19
0
0
10 Sep 2024
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan
Chaofeng Chen
Yiping Ke
Xinjiang Wang
Litong Feng
Wayne Zhang
VLM
24
23
0
17 Jul 2024
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023
Yurui Huang
Yang Yang
Shou Chen
Xiangyu Wu
Qingguo Chen
Jianfeng Lu
16
0
0
01 Jul 2024
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
32
1
0
13 Jun 2024
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Elaheh Baharlouei
Mahsa Shafaei
Yigeng Zhang
Hugo Jair Escalante
Thamar Solorio
27
0
0
12 Jun 2024
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Mehmet Hamza Erol
Arda Senocak
Jiu Feng
Joon Son Chung
Mamba
54
18
0
05 Jun 2024
Contrasting Multiple Representations with the Multi-Marginal Matching Gap
Zoe Piran
Michal Klein
James Thornton
Marco Cuturi
32
2
0
29 May 2024
Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification
Weizhen He
Yiheng Deng
Yunfeng Yan
Feng Zhu
Yizhou Wang
Lei Bai
Qingsong Xie
Donglian Qi
Wanli Ouyang
Shixiang Tang
84
2
0
28 May 2024
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
ZeMing Gong
Austin T. Wang
Joakim Bruslund Haurum
Scott C. Lowe
Graham W. Taylor
Angel X. Chang
Angel X. Chang
23
5
0
27 May 2024
MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space Binding
Jiajie Teng
Huiyu Duan
Yucheng Zhu
Sijing Wu
Guangtao Zhai
21
2
0
15 May 2024
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
25
1
0
12 May 2024
SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models
Burak Can Biner
Farrin Marouf Sofian
Umur Berkay Karakacs
Duygu Ceylan
Erkut Erdem
Aykut Erdem
14
7
0
01 May 2024
Learning text-to-video retrieval from image captioning
Lucas Ventura
Cordelia Schmid
Gül Varol
3DV
23
0
0
26 Apr 2024
A Survey on Multimodal Wearable Sensor-based Human Action Recognition
Jianyuan Ni
Hao Tang
Syed Tousiful Haque
Yan Yan
A. Ngu
61
5
0
14 Apr 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen
Kumar Ashutosh
Rohit Girdhar
David F. Harwath
Kristen Grauman
EgoV
SSL
21
6
0
08 Apr 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
18
2
0
28 Mar 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
24
5
0
28 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
19
1
0
24 Mar 2024
N-Modal Contrastive Losses with Applications to Social Media Data in Trimodal Space
William Theisen
Walter J. Scheirer
12
1
0
18 Mar 2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Jongsuk Kim
Hyeongkeun Lee
Kyeongha Rho
Junmo Kim
Joon Son Chung
16
4
0
14 Mar 2024
A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends
Abolfazl Younesi
Mohsen Ansari
Mohammadamin Fazli
A. Ejlali
Muhammad Shafique
Joerg Henkel
3DV
30
43
0
23 Feb 2024
Review of multimodal machine learning approaches in healthcare
"Felix H. Krones
Umar Marikkar
Guy Parsons
Adam Szmul
Adam Mahdi
24
26
0
04 Feb 2024
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
Antonín Vobecký
Oriane Siméoni
David Hurych
Spyros Gidaris
Andrei Bursuc
Patrick Pérez
Josef Sivic
18
33
0
17 Jan 2024
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
David Gimeno-Gómez
Ana-Maria Bucur
Adrian Cosma
Carlos David Martínez Hinarejos
Paolo Rosso
22
2
0
05 Jan 2024
Perception Test 2023: A Summary of the First Challenge And Outcome
Joseph Heyward
João Carreira
Dima Damen
Andrew Zisserman
Viorica Patraucean
9
0
0
20 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLM
VLM
10
9
0
03 Dec 2023
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
13
18
0
27 Nov 2023
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Muhammad Adi Nugroho
Changick Kim
23
0
0
21 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
19
248
0
21 Nov 2023
Improving Unimodal Inference with Multimodal Transformers
K. Chumachenko
Alexandros Iosifidis
M. Gabbouj
22
0
0
16 Nov 2023
Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction
Xuming Hu
Junzhe Chen
Aiwei Liu
Shiao Meng
Lijie Wen
Philip S. Yu
23
7
0
25 Oct 2023
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
17
10
0
02 Oct 2023
RegBN: Batch Normalization of Multimodal Data with Regularization
Morteza Ghahremani
Christian Wachinger
25
6
0
01 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
20
15
0
28 Sep 2023
1
2
3
4
5
6
Next