Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.12402
Cited By
X
2
^2
2
-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
22 November 2022
Yan Zeng
Xinsong Zhang
Hang Li
Jiawei Wang
Jipeng Zhang
Hkust Wangchunshu Zhou
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks"
14 / 14 papers shown
Title
Solution for Emotion Prediction Competition of Workshop on Emotionally and Culturally Intelligent AI
Shengdong Xu
Zhouyang Chi
Yang Yang
14
0
0
26 Mar 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Simon Ging
M. A. Bravo
Thomas Brox
VLM
38
11
0
11 Feb 2024
Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering
Haopeng Li
Qiuhong Ke
Mingming Gong
Tom Drummond
27
1
0
03 Jan 2024
Vision-Language Foundation Models as Effective Robot Imitators
Xinghang Li
Minghuan Liu
Hanbo Zhang
Cunjun Yu
Jie Xu
...
Ya Jing
Weinan Zhang
Huaping Liu
Hang Li
Tao Kong
LM&Ro
16
132
0
02 Nov 2023
ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense
Kankan Zhou
Eason Lai
Wei Bin Au Yeong
K. Mouratidis
Jing Jiang
ReLM
LRM
VLM
17
19
0
30 Oct 2023
RLIPv2: Fast Scaling of Relational Language-Image Pre-training
Hangjie Yuan
Shiwei Zhang
Xiang Wang
Samuel Albanie
Yining Pan
Tao Feng
Jianwen Jiang
Dong Ni
Yingya Zhang
Deli Zhao
VLM
14
37
0
18 Aug 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
15
5
0
06 Jul 2023
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLM
AI4CE
LRM
6
17
0
12 Jan 2023
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
Yan Zeng
Wangchunshu Zhou
Ao Luo
Ziming Cheng
Xinsong Zhang
VLM
11
30
0
01 Jun 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,010
0
28 Jan 2022
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Yongfei Liu
Chenfei Wu
Shao-Yen Tseng
Vasudev Lal
Xuming He
Nan Duan
CLIP
VLM
47
28
0
22 Sep 2021
MURAL: Multimodal, Multitask Retrieval Across Languages
Aashi Jain
Mandy Guo
Krishna Srinivasan
Ting-Li Chen
Sneha Kudugunta
Chao Jia
Yinfei Yang
Jason Baldridge
VLM
112
52
0
10 Sep 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
1