Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.00048
Cited By
Multi-modal Alignment using Representation Codebook
28 February 2022
Jiali Duan
Liqun Chen
Son Tran
Jinyu Yang
Yi Xu
Belinda Zeng
Trishul M. Chilimbi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-modal Alignment using Representation Codebook"
14 / 14 papers shown
Title
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Zae Myung Kim
Dongyeop Kang
Yao-Yi Chiang
Alfred Hero
23
0
0
02 Oct 2024
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
40
9
0
29 Aug 2024
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu
Zhengyao Fang
Pengyuan Lyu
Chengquan Zhang
Fanglin Chen
Guangming Lu
Wenjie Pei
47
2
0
28 Jul 2024
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
48
9
0
23 Aug 2023
TOT: Topology-Aware Optimal Transport For Multimodal Hate Detection
Linhao Zhang
Li Jin
Xian Sun
Guangluan Xu
Zequn Zhang
Xiaoyu Li
Nayu Liu
Qing Liu
Shiyao Yan
28
7
0
27 Feb 2023
SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Yash J. Patel
Yusheng Xie
Yi Zhu
Srikar Appalaraju
R. Manmatha
19
4
0
07 Feb 2023
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
19
8
0
09 Oct 2022
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
22
67
0
03 Aug 2022
Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics
Xiaoyuan Guo
Jiali Duan
C.-C. Jay Kuo
J. Gichoya
Imon Banerjee
VLM
14
1
0
31 Jul 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Shashank Goel
Hritik Bansal
S. Bhatia
Ryan A. Rossi
Vishwa Vinay
Aditya Grover
CLIP
VLM
166
132
0
28 May 2022
Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang
Jiali Duan
Son N. Tran
Yi Xu
Sampath Chanda
Liqun Chen
Belinda Zeng
Trishul M. Chilimbi
Junzhou Huang
VLM
29
287
0
21 Feb 2022
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
295
5,761
0
29 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
525
0
04 Feb 2021
1