ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07490
  4. Cited By
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

20 August 2019
Hao Hao Tan
Mohit Bansal
    VLM
    MLLM
ArXivPDFHTML

Papers citing "LXMERT: Learning Cross-Modality Encoder Representations from Transformers"

50 / 1,506 papers shown
Title
A Comparison of Pre-trained Vision-and-Language Models for Multimodal
  Representation Learning across Medical Images and Reports
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
6
62
0
03 Sep 2020
Active Contrastive Learning of Audio-Visual Video Representations
Active Contrastive Learning of Audio-Visual Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
VLM
SSL
9
8
0
31 Aug 2020
Visual Question Answering on Image Sets
Visual Question Answering on Image Sets
Ankan Bansal
Yuting Zhang
Rama Chellappa
CoGe
8
40
0
27 Aug 2020
Linguistically-aware Attention for Reducing the Semantic-Gap in
  Vision-Language Tasks
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks
K. Gouthaman
Athira M. Nambiar
K. Srinivas
Anurag Mittal
VLM
12
12
0
18 Aug 2020
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
DeVLBert: Learning Deconfounded Visio-Linguistic Representations
Shengyu Zhang
Tan Jiang
Tan Wang
Kun Kuang
Zhou Zhao
Jianke Zhu
Jin Yu
Hongxia Yang
Fei Wu
OOD
9
85
0
16 Aug 2020
Weakly supervised cross-domain alignment with optimal transport
Weakly supervised cross-domain alignment with optimal transport
Siyang Yuan
Ke Bai
Liqun Chen
Yizhe Zhang
Chenyang Tao
Chunyuan Li
Guoyin Wang
Ricardo Henao
Lawrence Carin
OT
16
7
0
14 Aug 2020
A Machine of Few Words -- Interactive Speaker Recognition with
  Reinforcement Learning
A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning
Mathieu Seurin
Florian Strub
Philippe Preux
Olivier Pietquin
6
5
0
07 Aug 2020
Learning Visual Representations with Caption Annotations
Learning Visual Representations with Caption Annotations
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
VLM
SSL
6
158
0
04 Aug 2020
Describing Textures using Natural Language
Describing Textures using Natural Language
Chenyun Wu
Mikayla Timm
Subhransu Maji
3DV
12
10
0
03 Aug 2020
SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space
Liu Yang
VLM
16
5
0
02 Aug 2020
Pre-training for Video Captioning Challenge 2020 Summary
Pre-training for Video Captioning Challenge 2020 Summary
Yingwei Pan
Jun Xu
Yehao Li
Ting Yao
Tao Mei
8
1
0
27 Jul 2020
REXUP: I REason, I EXtract, I UPdate with Structured Compositional
  Reasoning for Visual Question Answering
REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering
Siwen Luo
S. Han
Kaiyuan Sun
Josiah Poon
CoGe
LRM
ReLM
11
4
0
27 Jul 2020
Contrastive Visual-Linguistic Pretraining
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Peng Gao
Zuohui Fu
Gerard de Melo
Sen Su
VLM
SSL
CLIP
18
29
0
26 Jul 2020
Spatially Aware Multimodal Transformers for TextVQA
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant
Dhruv Batra
Peter Anderson
A. Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
6
85
0
23 Jul 2020
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine
  Translation
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
Yongjing Yin
Fandong Meng
Jinsong Su
Chulun Zhou
Zhengyuan Yang
Jie Zhou
Jiebo Luo
14
138
0
17 Jul 2020
Reducing Language Biases in Visual Question Answering with
  Visually-Grounded Question Encoder
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
36
78
0
13 Jul 2020
IQ-VQA: Intelligent Visual Question Answering
IQ-VQA: Intelligent Visual Question Answering
Vatsal Goel
Mohit Chandak
A. Anand
Prithwijit Guha
17
5
0
08 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for
  Vision-language Pre-training
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
16
57
0
05 Jul 2020
Latent Compositional Representations Improve Systematic Generalization
  in Grounded Question Answering
Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering
Ben Bogin
Sanjay Subramanian
Matt Gardner
Jonathan Berant
ReLM
OOD
BDL
LRM
6
28
0
01 Jul 2020
Multimodal Text Style Transfer for Outdoor Vision-and-Language
  Navigation
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation
Wanrong Zhu
X. Wang
Tsu-jui Fu
An Yan
P. Narayana
Kazoo Sone
Sugato Basu
W. Wang
21
33
0
01 Jul 2020
Modality-Agnostic Attention Fusion for visual search with text feedback
Modality-Agnostic Attention Fusion for visual search with text feedback
Eric Dodds
Jack Culpepper
Simão Herdade
Yang Zhang
K. Boakye
EgoV
6
71
0
30 Jun 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through
  Scene Graph
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
4
375
0
30 Jun 2020
Improving VQA and its Explanations \\ by Comparing Competing
  Explanations
Improving VQA and its Explanations \\ by Comparing Competing Explanations
Jialin Wu
Liyan Chen
Raymond J. Mooney
FAtt
AAML
22
17
0
28 Jun 2020
Unsupervised Video Decomposition using Spatio-temporal Iterative
  Inference
Unsupervised Video Decomposition using Spatio-temporal Iterative Inference
Polina Zablotskaia
E. Dominici
Leonid Sigal
Andreas M. Lehrmann
OCL
12
19
0
25 Jun 2020
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Saeed Amizadeh
Hamid Palangi
Oleksandr Polozov
Yichen Huang
K. Koishida
NAI
LRM
20
58
0
20 Jun 2020
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Corentin Dancette
Rémi Cadène
Xinlei Chen
Matthieu Cord
6
3
0
17 Jun 2020
Contrastive Learning for Weakly Supervised Phrase Grounding
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta
Arash Vahdat
Gal Chechik
Xiaodong Yang
Jan Kautz
Derek Hoiem
ObjD
SSL
27
139
0
17 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
11
432
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
24
485
0
11 Jun 2020
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
OOD
10
86
0
09 Jun 2020
Translating Natural Language Instructions for Behavioral Robot
  Navigation with a Multi-Head Attention Mechanism
Translating Natural Language Instructions for Behavioral Robot Navigation with a Multi-Head Attention Mechanism
Patricio Cerda-Mardini
Vladimir Araujo
Alvaro Soto
11
5
0
01 Jun 2020
Adaptive Transformers for Learning Multimodal Representations
Adaptive Transformers for Learning Multimodal Representations
Prajjwal Bhargava
6
4
0
15 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained
  Vision-and-Language Models
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
14
127
0
15 May 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
21
36
0
12 May 2020
MART: Memory-Augmented Recurrent Transformer for Coherent Video
  Paragraph Captioning
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Jie Lei
Liwei Wang
Yelong Shen
Dong Yu
Tamara L. Berg
Mohit Bansal
14
186
0
11 May 2020
History for Visual Dialog: Do we really need it?
History for Visual Dialog: Do we really need it?
Shubham Agarwal
Trung Bui
Joon-Young Lee
Ioannis Konstas
Verena Rieser
VLM
11
69
0
08 May 2020
Cross-media Structured Common Space for Multimedia Event Extraction
Cross-media Structured Common Space for Multimedia Event Extraction
Manling Li
Alireza Zareian
Qi Zeng
Spencer Whitehead
Di Lu
Heng Ji
Shih-Fu Chang
6
102
0
05 May 2020
Visual Question Answering with Prior Class Semantics
Visual Question Answering with Prior Class Semantics
Violetta Shevchenko
Damien Teney
A. Dick
A. Hengel
BDL
8
7
0
04 May 2020
Visually Grounded Continual Learning of Compositional Phrases
Visually Grounded Continual Learning of Compositional Phrases
Xisen Jin
Junyi Du
Arka Sadhu
Ram Nevatia
Xiang Ren
CLL
12
4
0
02 May 2020
Obtaining Faithful Interpretations from Compositional Neural Networks
Obtaining Faithful Interpretations from Compositional Neural Networks
Sanjay Subramanian
Ben Bogin
Nitish Gupta
Tomer Wolfson
Sameer Singh
Jonathan Berant
Matt Gardner
6
42
0
02 May 2020
Probing Contextual Language Models for Common Ground with Visual
  Representations
Probing Contextual Language Models for Common Ground with Visual Representations
Gabriel Ilharco
Rowan Zellers
Ali Farhadi
Hannaneh Hajishirzi
6
14
0
01 May 2020
Visuo-Linguistic Question Answering (VLQA) Challenge
Visuo-Linguistic Question Answering (VLQA) Challenge
Shailaja Keyur Sampat
Yezhou Yang
Chitta Baral
CoGe
11
1
0
01 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
29
491
0
01 May 2020
Crisscrossed Captions: Extended Intramodal and Intermodal Semantic
  Similarity Judgments for MS-COCO
Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
Zarana Parekh
Jason Baldridge
Daniel Matthew Cer
Austin Waters
Yinfei Yang
6
61
0
30 Apr 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the
  Web
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra
LM&Ro
29
230
0
30 Apr 2020
Span-based Localizing Network for Natural Language Video Localization
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Joey Tianyi Zhou
4
311
0
29 Apr 2020
VD-BERT: A Unified Vision and Dialog Transformer with BERT
VD-BERT: A Unified Vision and Dialog Transformer with BERT
Yue Wang
Shafiq R. Joty
Michael R. Lyu
Irwin King
Caiming Xiong
S. Hoi
19
102
0
28 Apr 2020
A Novel Attention-based Aggregation Function to Combine Vision and
  Language
A Novel Attention-based Aggregation Function to Combine Vision and Language
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
9
9
0
27 Apr 2020
Deep Multimodal Neural Architecture Search
Deep Multimodal Neural Architecture Search
Zhou Yu
Yuhao Cui
Jun-chen Yu
Meng Wang
Dacheng Tao
Qi Tian
8
98
0
25 Apr 2020
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
J. S. Park
Chandra Bhagavatula
Roozbeh Mottaghi
Ali Farhadi
Yejin Choi
ReLM
LRM
11
6
0
22 Apr 2020
Previous
123...28293031
Next