
Title |
|---|
![]() VL-BERT: Pre-training of Generic Visual-Linguistic RepresentationsInternational Conference on Learning Representations (ICLR), 2019 |
![]() LXMERT: Learning Cross-Modality Encoder Representations from
TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2019 |
![]() Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
Pre-trainingAAAI Conference on Artificial Intelligence (AAAI), 2019 |
![]() Fusion of Detected Objects in Text for Visual Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2019 |
![]() ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
Vision-and-Language TasksNeural Information Processing Systems (NeurIPS), 2019 |