Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.15409
Cited By
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
31 December 2020
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua-Hong Wu
Haifeng Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning"
41 / 41 papers shown
Title
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Liang Lin
LRM
96
3
0
17 Feb 2025
ComAlign: Compositional Alignment in Vision-Language Models
Ali Abdollah
Amirmohammad Izadi
Armin Saghafian
Reza Vahidimajd
Mohammad Mozafari
Amirreza Mirzaei
Mohammadmahdi Samiei
M. Baghshah
CoGe
VLM
25
0
0
12 Sep 2024
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu
Zhengyao Fang
Pengyuan Lyu
Chengquan Zhang
Fanglin Chen
Guangming Lu
Wenjie Pei
47
2
0
28 Jul 2024
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy
Yasaman Parhizkar
Franklin Ogidi
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
VLM
35
1
0
11 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
30
13
0
08 Jun 2024
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie
Polina Kirichenko
Mark Ibrahim
Mahmoud Assran
Andrew Gordon Wilson
Aaron Courville
Nicolas Ballas
CLIP
VLM
59
19
0
30 Apr 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
49
7
0
21 Mar 2024
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi
Qi Dong
Luis Goncalves
Zhuowen Tu
Stefano Soatto
VLM
35
3
0
04 Mar 2024
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Jiayun Luo
Siddhesh Khandelwal
Leonid Sigal
Boyang Albert Li
MLLM
VLM
27
7
0
28 Nov 2023
Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion
Simone Bianco
Luigi Celona
Marco Donzella
Paolo Napoletano
24
18
0
20 Jun 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
16
113
0
18 May 2023
Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution
Jianfeng Kuang
Wei Hua
Dingkang Liang
Mingkun Yang
Deqiang Jiang
Bo Ren
Xiang Bai
25
39
0
12 May 2023
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Zhiqiu Lin
Samuel Yu
Zhiyi Kuang
Deepak Pathak
Deva Ramana
VLM
15
97
0
16 Jan 2023
Position-guided Text Prompt for Vision-Language Pre-training
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
11
37
0
19 Dec 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Mohit Bansal
VLM
11
9
0
21 Nov 2022
Self-supervised remote sensing feature learning: Learning Paradigms, Challenges, and Future Works
Chao Tao
Ji Qi
Mingning Guo
Qing Zhu
Haifeng Li
SSL
19
56
0
15 Nov 2022
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Mohit Bansal
VLM
47
28
0
28 Sep 2022
Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model
Yixiao Zhang
Junyan Jiang
Gus Xia
S. Dixon
25
9
0
24 Aug 2022
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
Haoran Wang
Dongliang He
Wenhao Wu
Boyang Xia
Min Yang
Fu Li
Yunlong Yu
Zhong Ji
Errui Ding
Jingdong Wang
19
22
0
21 Aug 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
13
46
0
26 Jul 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
20
42
0
17 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
11
123
0
15 Jun 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
518
0
13 Jun 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
17
13
0
30 May 2022
Target-aware Abstractive Related Work Generation with Contrastive Learning
Xiuying Chen
Hind Alamro
Li Mingzhe
Shen Gao
Rui Yan
Xin Gao
Xiangliang Zhang
3DV
139
27
0
26 May 2022
Vision-and-Language Pretrained Models: A Survey
Siqu Long
Feiqi Cao
S. Han
Haiqing Yang
VLM
16
63
0
15 Apr 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
20
16
0
27 Mar 2022
STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation
Qingkai Fang
Rong Ye
Lei Li
Yang Feng
Mingxuan Wang
11
95
0
20 Mar 2022
DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training
Luyang Huang
Guocheng Niu
Jiachen Liu
Xinyan Xiao
Hua-Hong Wu
VLM
CoGe
12
7
0
17 Mar 2022
Multi-modal Alignment using Representation Codebook
Jiali Duan
Liqun Chen
Son Tran
Jinyu Yang
Yi Xu
Belinda Zeng
Trishul M. Chilimbi
17
66
0
28 Feb 2022
A Survey of Vision-Language Pre-Trained Models
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
VLM
13
177
0
18 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,010
0
28 Jan 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
27
58
0
31 Dec 2021
Injecting Semantic Concepts into End-to-End Image Captioning
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lin Liang
Zhe Gan
Lijuan Wang
Yezhou Yang
Zicheng Liu
ViT
VLM
16
85
0
09 Dec 2021
Interactive Model with Structural Loss for Language-based Abductive Reasoning
Linhao Li
Ming Xu
Yongfeng Dong
Xin Li
Ao Wang
12
2
0
01 Dec 2021
A Multimodal Sentiment Dataset for Video Recommendation
Hongxuan Tang
Hao Liu
Xinyan Xiao
Hua-Hong Wu
VGen
13
2
0
17 Sep 2021
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling
Jue Wang
Haofan Wang
Jincan Deng
Weijia Wu
Debing Zhang
VLM
CLIP
57
18
0
10 Sep 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
49
766
0
24 Aug 2021
Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training
Ming Yan
Haiyang Xu
Chenliang Li
Bin Bi
Junfeng Tian
Min Gui
Wei Wang
VLM
14
10
0
21 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
19
52
0
16 Aug 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
922
0
24 Sep 2019
1