ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.06066
  4. Cited By
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training

Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training

16 August 2019
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
    SSL
    VLM
    MLLM
ArXivPDFHTML

Papers citing "Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training"

50 / 510 papers shown
Title
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Chunhui Zhang
Xin Sun
Li Liu
Yiqian Yang
Qiong Liu
Xiaoping Zhou
Yanfeng Wang
38
15
0
07 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
15
5
0
06 Jul 2023
Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph
  Reasoning
Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning
K. Liang
Sihang Zhou
Yue Liu
Lingyuan Meng
Meng Liu
Xinwang Liu
28
15
0
06 Jul 2023
S-Omninet: Structured Data Enhanced Universal Multimodal Learning
  Architecture
S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture
Ye Xue
Diego Klabjan
J. Utke
18
0
0
01 Jul 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Bernard Ghanem
Dacheng Tao
ObjD
VLM
27
135
0
28 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching
  Attention and Input
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
20
4
0
25 Jun 2023
Exploring the Role of Audio in Video Captioning
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
18
2
0
21 Jun 2023
Generation of Radiology Findings in Chest X-Ray by Leveraging
  Collaborative Knowledge
Generation of Radiology Findings in Chest X-Ray by Leveraging Collaborative Knowledge
Manuela Danu
George Marica
Sanjeev Kumar Karn
Bogdan Georgescu
Awais Mansoor
...
Lucian Mihai Itu
C. Suciu
Sasa Grbic
Oladimeji Farri
D. Comaniciu
MedIm
19
8
0
18 Jun 2023
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal
  Contrastive Training
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training
Chong Liu
Yuqi Zhang
Hongsong Wang
Weihua Chen
F. Wang
Yan Huang
Yixing Shen
Liang Wang
19
25
0
15 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning
  Tasks
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
19
170
0
11 Jun 2023
Object Detection with Transformers: A Review
Object Detection with Transformers: A Review
Tahira Shehzadi
K. Hashmi
D. Stricker
Muhammad Zeshan Afzal
ViT
MU
13
27
0
07 Jun 2023
Table and Image Generation for Investigating Knowledge of Entities in
  Pre-trained Vision and Language Models
Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
VLM
15
1
0
03 Jun 2023
ManagerTower: Aggregating the Insights of Uni-Modal Experts for
  Vision-Language Representation Learning
ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Xiao Xu
Bei Li
Chenfei Wu
Shao-Yen Tseng
Anahita Bhiwandiwalla
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
AIFin
VLM
26
2
0
31 May 2023
Deeply Coupled Cross-Modal Prompt Learning
Deeply Coupled Cross-Modal Prompt Learning
Xuejing Liu
Wei Tang
Jinghui Lu
Rui Zhao
Zhaojun Guo
Fei Tan
VLM
20
17
0
29 May 2023
Training Data Extraction From Pre-trained Language Models: A Survey
Training Data Extraction From Pre-trained Language Models: A Survey
Shotaro Ishihara
24
46
0
25 May 2023
MMNet: Multi-Mask Network for Referring Image Segmentation
MMNet: Multi-Mask Network for Referring Image Segmentation
Yimin Yan
Xingjian He
Wenxuan Wan
J. Liu
EgoV
22
1
0
24 May 2023
UniChart: A Universal Vision-language Pretrained Model for Chart
  Comprehension and Reasoning
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Shafiq R. Joty
LRM
19
100
0
24 May 2023
BigVideo: A Large-scale Video Subtitle Translation Dataset for
  Multimodal Machine Translation
BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation
Liyan Kang
Luyang Huang
Ningxin Peng
Peihao Zhu
Zewei Sun
Shanbo Cheng
Mingxuan Wang
Degen Huang
Jinsong Su
18
9
0
23 May 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
EDIS: Entity-Driven Image Search over Multimodal Web Content
Siqi Liu
Weixi Feng
Tsu-jui Fu
Wenhu Chen
W. Wang
VLM
48
9
0
23 May 2023
Probing the Role of Positional Information in Vision-Language Models
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch
Jindrich Libovický
16
8
0
17 May 2023
OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual
  Question Answering in Vietnamese
OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese
Nghia Hieu Nguyen
Duong T.D. Vo
Kiet Van Nguyen
N. Nguyen
24
18
0
07 May 2023
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal
  Structured Representations
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
Yufen Huang
Jiji Tang
Zhuo Chen
Rongsheng Zhang
Xinfeng Zhang
...
Zeng Zhao
Zhou Zhao
Tangjie Lv
Zhipeng Hu
Wen Zhang
VLM
15
21
0
06 May 2023
ArK: Augmented Reality with Knowledge Interactive Emergent Ability
ArK: Augmented Reality with Knowledge Interactive Emergent Ability
Qiuyuan Huang
J. Park
Abhinav Gupta
Paul N. Bennett
Ran Gong
...
Baolin Peng
O. Mohammed
C. Pal
Yejin Choi
Jianfeng Gao
73
6
0
01 May 2023
Towards Medical Artificial General Intelligence via Knowledge-Enhanced
  Multimodal Pretraining
Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining
Bingqian Lin
Zicong Chen
Mingjie Li
Haokun Lin
Hang Xu
...
Ling-Hao Chen
Xiaojun Chang
Yi Yang
L. Xing
Xiaodan Liang
LM&MA
MedIm
AI4CE
38
14
0
26 Apr 2023
Rethinking Benchmarks for Cross-modal Image-text Retrieval
Rethinking Benchmarks for Cross-modal Image-text Retrieval
Wei-Neng Chen
Linli Yao
Qin Jin
VLM
16
17
0
21 Apr 2023
W-MAE: Pre-trained weather model with masked autoencoder for
  multi-variable weather forecasting
W-MAE: Pre-trained weather model with masked autoencoder for multi-variable weather forecasting
Xin Man
Chenghong Zhang
Jin Feng
Changyu Li
Jie Shao
AI4Cl
39
24
0
18 Apr 2023
Towards Robust Prompts on Vision-Language Models
Towards Robust Prompts on Vision-Language Models
Jindong Gu
Ahmad Beirami
Xuezhi Wang
Alex Beutel
Philip H. S. Torr
Yao Qin
VLM
VPVLM
25
8
0
17 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and
  Language
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIP
VLM
16
1
0
10 Apr 2023
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Noa Garcia
Yusuke Hirota
Yankun Wu
Yuta Nakashima
EGVM
31
51
0
06 Apr 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
19
43
0
31 Mar 2023
Borrowing Human Senses: Comment-Aware Self-Training for Social Media
  Multimodal Classification
Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification
Chunpu Xu
Jing Li
VLM
13
5
0
27 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
40
46
0
21 Mar 2023
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation
Kunyang Han
Yong-Jin Liu
Jun Hao Liew
Henghui Ding
Yunchao Wei
...
Yitong Wang
Yansong Tang
Yujiu Yang
Jiashi Feng
Yao-Min Zhao
VLM
34
35
0
16 Mar 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal
  Pre-training
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Zhao Cao
VLM
25
1
0
09 Mar 2023
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test
  Questions
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test Questions
He Zhu
Xihua Li
Xuemin Zhao
Yunbo Cao
Shan Yu
12
0
0
09 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
Toward Unsupervised Realistic Visual Question Answering
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
14
2
0
09 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
23
220
0
27 Feb 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language
  Pre-training Model
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
14
9
0
27 Feb 2023
Test-Time Distribution Normalization for Contrastively Learned
  Vision-language Models
Test-Time Distribution Normalization for Contrastively Learned Vision-language Models
Yi Zhou
Juntao Ren
Fengyu Li
Ramin Zabih
Ser-Nam Lim
VLM
26
13
0
22 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Xiao Wang
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
26
200
0
20 Feb 2023
Rejecting Cognitivism: Computational Phenomenology for Deep Learning
Rejecting Cognitivism: Computational Phenomenology for Deep Learning
P. Beckmann
G. Köstner
Ines Hipólito
19
4
0
16 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
12
7
0
16 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future
  Directions
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
35
40
0
14 Feb 2023
Paparazzi: A Deep Dive into the Capabilities of Language and Vision
  Models for Grounding Viewpoint Descriptions
Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions
Henrik Voigt
J. Hombeck
M. Meuschke
K. Lawonn
Sina Zarrieß
VLM
20
1
0
13 Feb 2023
VITR: Augmenting Vision Transformers with Relation-Focused Learning for
  Cross-Modal Information Retrieval
VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval
Yansong Gong
Georgina Cosma
Axel Finke
ViT
28
2
0
13 Feb 2023
Actional Atomic-Concept Learning for Demystifying Vision-Language
  Navigation
Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation
Bingqian Lin
Yi Zhu
Xiaodan Liang
Liang Lin
Jian-zhuo Liu
CoGe
LM&Ro
31
3
0
13 Feb 2023
Unified Vision-Language Representation Modeling for E-Commerce
  Same-Style Products Retrieval
Unified Vision-Language Representation Modeling for E-Commerce Same-Style Products Retrieval
Ben Chen
Linbo Jin
Xinxin Wang
D. Gao
Wen Jiang
Wei Ning
14
3
0
10 Feb 2023
Learning to Agree on Vision Attention for Visual Commonsense Reasoning
Learning to Agree on Vision Attention for Visual Commonsense Reasoning
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Fan Liu
Liqiang Nie
Mohan S. Kankanhalli
32
10
0
04 Feb 2023
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View
  Semantic Consistency
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency
Pengzhen Ren
Changlin Li
Hang Xu
Yi Zhu
Guangrun Wang
Jian-zhuo Liu
Xiaojun Chang
Xiaodan Liang
34
43
0
31 Jan 2023
Effective End-to-End Vision Language Pretraining with Semantic Visual
  Loss
Effective End-to-End Vision Language Pretraining with Semantic Visual Loss
Xiaofeng Yang
Fayao Liu
Guosheng Lin
VLM
19
7
0
18 Jan 2023
Previous
123456...91011
Next