Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.07490
Cited By
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
20 August 2019
Hao Hao Tan
Mohit Bansal
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LXMERT: Learning Cross-Modality Encoder Representations from Transformers"
50 / 1,506 papers shown
Title
Improving Calibration in Deep Metric Learning With Cross-Example Softmax
Andreas Veit
Kimberly Wilber
7
2
0
17 Nov 2020
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions
Jianan Wang
Boyang Albert Li
Xiangyu Fan
Jing-Hua Lin
Yanwei Fu
15
2
0
15 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
6
417
0
14 Nov 2020
Cross-Modality Protein Embedding for Compound-Protein Affinity and Contact Prediction
Yuning You
Yang Shen
12
8
0
14 Nov 2020
Transductive Zero-Shot Learning using Cross-Modal CycleGAN
Patrick Bordes
Éloi Zablocki
Benjamin Piwowarski
Patrick Gallinari
VLM
14
0
0
13 Nov 2020
Multimodal Pretraining for Dense Video Captioning
Gabriel Huang
Bo Pang
Zhenhai Zhu
Clara E. Rivera
Radu Soricut
8
82
0
10 Nov 2020
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Zongheng Tang
Yue Liao
Si Liu
Guanbin Li
Xiaojie Jin
Hongxu Jiang
Qian Yu
Dong Xu
11
94
0
10 Nov 2020
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts
Ece Takmaz
Mario Giulianelli
Sandro Pezzelle
Arabella J. Sinclair
Raquel Fernández
8
26
0
09 Nov 2020
CapWAP: Captioning with a Purpose
Adam Fisch
Kenton Lee
Ming-Wei Chang
J. Clark
Regina Barzilay
8
11
0
09 Nov 2020
Multi-modal, multi-task, multi-attention (M3) deep learning detection of reticular pseudodrusen: towards automated and accessible classification of age-related macular degeneration
Qingyu Chen
T. Keenan
Alexis Allot
Yifan Peng
Elvira Agrón
...
Chantal Cousineau-Krieger
W. Wong
Yingying Zhu
E. Chew
Zhiyong Lu
MedIm
8
19
0
09 Nov 2020
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Yikang Shen
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
36
689
0
08 Nov 2020
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles
Christopher Clark
Mark Yatskar
Luke Zettlemoyer
18
60
0
07 Nov 2020
Utilizing Every Image Object for Semi-supervised Phrase Grounding
Haidong Zhu
Arka Sadhu
Zhao-Heng Zheng
Ram Nevatia
ObjD
12
7
0
05 Nov 2020
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
Yue Wang
Jing Li
M. Lyu
Irwin King
6
16
0
03 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
13
168
0
01 Nov 2020
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Q. Tian
Min Zhang
19
69
0
30 Oct 2020
Leveraging Visual Question Answering to Improve Text-to-Image Synthesis
Stanislav Frolov
Shailza Jolly
Jörn Hees
Andreas Dengel
EGVM
12
5
0
28 Oct 2020
Co-attentional Transformers for Story-Based Video Understanding
Björn Bebensee
Byoung-Tak Zhang
6
4
0
27 Oct 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
19
56
0
27 Oct 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSL
VLM
64
12
0
24 Oct 2020
Can images help recognize entities? A study of the role of images for Multimodal NER
Shuguang Chen
Gustavo Aguilar
Leonardo Neves
Thamar Solorio
EgoV
35
33
0
23 Oct 2020
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
A. Schwing
Tamir Hazan
51
89
0
21 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
18
6
0
19 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
13
2
0
17 Oct 2020
Unsupervised Natural Language Inference via Decoupled Multimodal Contrastive Learning
Wanyun Cui
Guangyu Zheng
Wei Wang
SSL
14
21
0
16 Oct 2020
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
Ana Marasović
Chandra Bhagavatula
J. S. Park
Ronan Le Bras
Noah A. Smith
Yejin Choi
ReLM
LRM
18
61
0
15 Oct 2020
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan
Mohit Bansal
CLIP
6
120
0
14 Oct 2020
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!
Jack Hessel
Lillian Lee
8
72
0
13 Oct 2020
CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
Fuli Luo
Pengcheng Yang
Shicheng Li
Xuancheng Ren
Xu Sun
VLM
SSL
8
16
0
13 Oct 2020
Contrast and Classify: Training Robust VQA Models
Yash Kant
A. Moudgil
Dhruv Batra
Devi Parikh
Harsh Agrawal
19
5
0
13 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
28
11
0
12 Oct 2020
Beyond Language: Learning Commonsense from Images for Reasoning
Wanqing Cui
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
11
5
0
10 Oct 2020
Interpretable Neural Computation for Real-World Compositional Visual Question Answering
Ruixue Tang
Chao Ma
CoGe
6
2
0
10 Oct 2020
ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization
Tzuf Paz-Argaman
Y. Atzmon
Gal Chechik
Reut Tsarfaty
VLM
16
32
0
07 Oct 2020
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
20
242
0
06 Oct 2020
Pathological Visual Question Answering
Xuehai He
Zhuo Cai
Wenlan Wei
Yichen Zhang
Luntian Mou
Eric P. Xing
P. Xie
62
24
0
06 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
11
2
0
05 Oct 2020
Multi-Modal Open-Domain Dialogue
Kurt Shuster
Eric Michael Smith
Da Ju
Jason Weston
AI4CE
28
42
0
02 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Patrick Xia
Shijie Wu
Benjamin Van Durme
18
50
0
02 Oct 2020
Contrastive Learning of Medical Visual Representations from Paired Images and Text
Yuhao Zhang
Hang Jiang
Yasuhide Miura
Christopher D. Manning
C. Langlotz
MedIm
13
724
0
02 Oct 2020
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention
José Manuél Gómez-Pérez
Raúl Ortega
23
23
0
01 Oct 2020
Learning Object Detection from Captions via Textual Scene Attributes
Achiya Jerbi
Roei Herzig
Jonathan Berant
Gal Chechik
Amir Globerson
22
21
0
30 Sep 2020
Attention that does not Explain Away
Nan Ding
Xinjie Fan
Zhenzhong Lan
Dale Schuurmans
Radu Soricut
11
3
0
29 Sep 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu
Xi Yin
Kevin Qinghong Lin
Lijuan Wang
L. Zhang
Jianfeng Gao
Zicheng Liu
VLM
6
56
0
28 Sep 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho
Jiasen Lu
Dustin Schwenk
Hannaneh Hajishirzi
Aniruddha Kembhavi
VLM
MLLM
19
102
0
23 Sep 2020
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering
Tuong Khanh Long Do
Binh X. Nguyen
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
Thanh-Toan Do
23
2
0
23 Sep 2020
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
OOD
14
139
0
18 Sep 2020
A Multimodal Memes Classification: A Survey and Open Research Issues
Tariq Habib Afridi
A. Alam
Muhammad Numan Khan
Jawad Khan
Young-Koo Lee
19
34
0
17 Sep 2020
Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product
Tiangang Zhu
Yue Wang
Haoran Li
Youzheng Wu
Xiaodong He
Bowen Zhou
6
69
0
15 Sep 2020
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models
Khyathi Raghavi Chandu
Piyush Sharma
Soravit Changpinyo
Ashish V. Thapliyal
Radu Soricut
DiffM
VLM
19
3
0
10 Sep 2020
Previous
1
2
3
...
27
28
29
30
31
Next