LXMERT: Learning Cross-Modality Encoder Representations from Transformers

20 August 2019

Papers citing "LXMERT: Learning Cross-Modality Encoder Representations from Transformers"

50 / 1,506 papers shown

Title
Improving Calibration in Deep Metric Learning With Cross-Example Softmax Andreas Veit Kimberly Wilber 7 2 0 17 Nov 2020
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions Jianan Wang Boyang Albert Li Xiangyu Fan Jing-Hua Lin Yanwei Fu 15 2 0 15 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations Linchao Zhu Yi Yang ViT 6 417 0 14 Nov 2020
Cross-Modality Protein Embedding for Compound-Protein Affinity and Contact Prediction Yuning You Yang Shen 12 8 0 14 Nov 2020
Transductive Zero-Shot Learning using Cross-Modal CycleGAN Patrick Bordes Éloi Zablocki Benjamin Piwowarski Patrick Gallinari VLM 14 0 0 13 Nov 2020
Multimodal Pretraining for Dense Video Captioning Gabriel Huang Bo Pang Zhenhai Zhu Clara E. Rivera Radu Soricut 8 82 0 10 Nov 2020
Human-centric Spatio-Temporal Video Grounding With Visual Transformers Zongheng Tang Yue Liao Si Liu Guanbin Li Xiaojie Jin Hongxu Jiang Qian Yu Dong Xu 11 94 0 10 Nov 2020
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts Ece Takmaz Mario Giulianelli Sandro Pezzelle Arabella J. Sinclair Raquel Fernández 8 26 0 09 Nov 2020
CapWAP: Captioning with a Purpose Adam Fisch Kenton Lee Ming-Wei Chang J. Clark Regina Barzilay 8 11 0 09 Nov 2020
Multi-modal, multi-task, multi-attention (M3) deep learning detection of reticular pseudodrusen: towards automated and accessible classification of age-related macular degeneration Qingyu Chen T. Keenan Alexis Allot Yifan Peng Elvira Agrón ... Chantal Cousineau-Krieger W. Wong Yingying Zhu E. Chew Zhiyong Lu MedIm 8 19 0 09 Nov 2020
Long Range Arena: A Benchmark for Efficient Transformers Yi Tay Mostafa Dehghani Samira Abnar Yikang Shen Dara Bahri Philip Pham J. Rao Liu Yang Sebastian Ruder Donald Metzler 36 689 0 08 Nov 2020
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles Christopher Clark Mark Yatskar Luke Zettlemoyer 18 60 0 07 Nov 2020
Utilizing Every Image Object for Semi-supervised Phrase Grounding Haidong Zhu Arka Sadhu Zhao-Heng Zheng Ram Nevatia ObjD 12 7 0 05 Nov 2020
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings Yue Wang Jing Li M. Lyu Irwin King 6 16 0 03 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Simon Ging Mohammadreza Zolfaghari Hamed Pirsiavash Thomas Brox ViT CLIP 13 168 0 01 Nov 2020
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View Yangyang Guo Liqiang Nie Zhiyong Cheng Q. Tian Min Zhang 19 69 0 30 Oct 2020
Leveraging Visual Question Answering to Improve Text-to-Image Synthesis Stanislav Frolov Shailza Jolly Jörn Hees Andreas Dengel EGVM 12 5 0 28 Oct 2020
Co-attentional Transformers for Story-Based Video Understanding Björn Bebensee Byoung-Tak Zhang 6 4 0 27 Oct 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering Aisha Urooj Khan Amir Mazaheri N. Lobo M. Shah 19 56 0 27 Oct 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions Liunian Harold Li Haoxuan You Zhecan Wang Alireza Zareian Shih-Fu Chang Kai-Wei Chang SSL VLM 64 12 0 24 Oct 2020
Can images help recognize entities? A study of the role of images for Multimodal NER Shuguang Chen Gustavo Aguilar Leonardo Neves Thamar Solorio EgoV 35 33 0 23 Oct 2020
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies Itai Gat Idan Schwartz A. Schwing Tamir Hazan 51 89 0 21 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends Shagun Uppal Sarthak Bhagat Devamanyu Hazarika Navonil Majumdar Soujanya Poria Roger Zimmermann Amir Zadeh 18 6 0 19 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering Hantao Huang Tao Han Wei Han D. Yap Cheng-Ming Chiang 13 2 0 17 Oct 2020
Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning Wanyun Cui Guangyu Zheng Wei Wang SSL 14 21 0 16 Oct 2020
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs Ana Marasović Chandra Bhagavatula J. S. Park Ronan Le Bras Noah A. Smith Yejin Choi ReLM LRM 18 61 0 15 Oct 2020
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision Hao Tan Mohit Bansal CLIP 6 120 0 14 Oct 2020
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think! Jack Hessel Lillian Lee 8 72 0 13 Oct 2020
CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations Fuli Luo Pengcheng Yang Shicheng Li Xuancheng Ren Xu Sun VLM SSL 8 16 0 13 Oct 2020
Contrast and Classify: Training Robust VQA Models Yash Kant A. Moudgil Dhruv Batra Devi Parikh Harsh Agrawal 19 5 0 13 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding Qinxin Wang Hao Tan Sheng Shen Michael W. Mahoney Z. Yao ObjD 28 11 0 12 Oct 2020
Beyond Language: Learning Commonsense from Images for Reasoning Wanqing Cui Yanyan Lan Liang Pang Jiafeng Guo Xueqi Cheng LRM 11 5 0 10 Oct 2020
Interpretable Neural Computation for Real-World Compositional Visual Question Answering Ruixue Tang Chao Ma CoGe 6 2 0 10 Oct 2020
ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization Tzuf Paz-Argaman Y. Atzmon Gal Chechik Reut Tsarfaty VLM 16 32 0 07 Oct 2020
Support-set bottlenecks for video-text representation learning Mandela Patrick Po-Yao (Bernie) Huang Yuki M. Asano Florian Metze Alexander G. Hauptmann João Henriques Andrea Vedaldi 20 242 0 06 Oct 2020
Pathological Visual Question Answering Xuehai He Zhuo Cai Wenlan Wei Yichen Zhang Luntian Mou Eric P. Xing P. Xie 62 24 0 06 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question Answering M. Farazi Salman Khan Nick Barnes 11 2 0 05 Oct 2020
Multi-Modal Open-Domain Dialogue Kurt Shuster Eric Michael Smith Da Ju Jason Weston AI4CE 28 42 0 02 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders Patrick Xia Shijie Wu Benjamin Van Durme 18 50 0 02 Oct 2020
Contrastive Learning of Medical Visual Representations from Paired Images and Text Yuhao Zhang Hang Jiang Yasuhide Miura Christopher D. Manning C. Langlotz MedIm 13 724 0 02 Oct 2020
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention José Manuél Gómez-Pérez Raúl Ortega 23 23 0 01 Oct 2020
Learning Object Detection from Captions via Textual Scene Attributes Achiya Jerbi Roei Herzig Jonathan Berant Gal Chechik Amir Globerson 22 21 0 30 Sep 2020
Attention that does not Explain Away Nan Ding Xinjie Fan Zhenzhong Lan Dale Schuurmans Radu Soricut 11 3 0 29 Sep 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning Xiaowei Hu Xi Yin Kevin Qinghong Lin Lijuan Wang L. Zhang Jianfeng Gao Zicheng Liu VLM 6 56 0 28 Sep 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers Jaemin Cho Jiasen Lu Dustin Schwenk Hannaneh Hajishirzi Aniruddha Kembhavi VLM MLLM 19 102 0 23 Sep 2020
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering Tuong Khanh Long Do Binh X. Nguyen Huy Tran Erman Tjiputra Quang-Dieu Tran Thanh-Toan Do 23 2 0 23 Sep 2020
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering Tejas Gokhale Pratyay Banerjee Chitta Baral Yezhou Yang OOD 14 139 0 18 Sep 2020
A Multimodal Memes Classification: A Survey and Open Research Issues Tariq Habib Afridi A. Alam Muhammad Numan Khan Jawad Khan Young-Koo Lee 19 34 0 17 Sep 2020
Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product Tiangang Zhu Yue Wang Haoran Li Youzheng Wu Xiaodong He Bowen Zhou 6 69 0 15 Sep 2020
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models Khyathi Raghavi Chandu Piyush Sharma Soravit Changpinyo Ashish V. Thapliyal Radu Soricut DiffM VLM 19 3 0 10 Sep 2020