UNITER: UNiversal Image-TExt Representation LearningEuropean Conference on Computer Vision (ECCV), 2019 |
Unified Vision-Language Pre-Training for Image Captioning and VQAAAAI Conference on Artificial Intelligence (AAAI), 2019 |
VL-BERT: Pre-training of Generic Visual-Linguistic RepresentationsInternational Conference on Learning Representations (ICLR), 2019 |
LXMERT: Learning Cross-Modality Encoder Representations from
TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2019 |
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
Pre-trainingAAAI Conference on Artificial Intelligence (AAAI), 2019 |
Fusion of Detected Objects in Text for Visual Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2019 |
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
Vision-and-Language TasksNeural Information Processing Systems (NeurIPS), 2019 |