ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07490
  4. Cited By
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

20 August 2019
Hao Hao Tan
Mohit Bansal
    VLM
    MLLM
ArXivPDFHTML

Papers citing "LXMERT: Learning Cross-Modality Encoder Representations from Transformers"

50 / 1,506 papers shown
Title
An Empirical Study on the Generalization Power of Neural Representations
  Learned via Visual Guessing Games
An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games
Alessandro Suglia
Yonatan Bisk
Ioannis Konstas
Antonio Vergari
E. Bastianelli
Andrea Vanzo
Oliver Lemon
18
8
0
31 Jan 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From
  Multimodal Inputs
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
25
66
0
28 Jan 2021
Scheduled Sampling in Vision-Language Pretraining with Decoupled
  Encoder-Decoder Network
Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network
Yehao Li
Yingwei Pan
Ting Yao
Jingwen Chen
Tao Mei
VLM
15
52
0
27 Jan 2021
Cross-lingual Visual Pre-training for Multimodal Machine Translation
Cross-lingual Visual Pre-training for Multimodal Machine Translation
Ozan Caglayan
Menekse Kuyu
Mustafa Sercan Amac
Pranava Madhyastha
Erkut Erdem
Aykut Erdem
Lucia Specia
VLM
19
42
0
25 Jan 2021
Adversarial Text-to-Image Synthesis: A Review
Adversarial Text-to-Image Synthesis: A Review
Stanislav Frolov
Tobias Hinz
Federico Raue
Jörn Hees
Andreas Dengel
EGVM
14
176
0
25 Jan 2021
RomeBERT: Robust Training of Multi-Exit BERT
RomeBERT: Robust Training of Multi-Exit BERT
Shijie Geng
Peng Gao
Zuohui Fu
Yongfeng Zhang
17
26
0
24 Jan 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
6
165
0
21 Jan 2021
Understanding in Artificial Intelligence
Understanding in Artificial Intelligence
S. Maetschke
D. M. Iraola
Pieter Barnard
Elaheh Shafieibavani
Peter Zhong
Ying Xu
Antonio Jimeno Yepes
ELM
VLM
11
0
0
17 Jan 2021
Latent Variable Models for Visual Question Answering
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
17
5
0
16 Jan 2021
Reasoning over Vision and Language: Exploring the Benefits of
  Supplemental Knowledge
Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge
Violetta Shevchenko
Damien Teney
A. Dick
A. Hengel
6
28
0
15 Jan 2021
Understanding the Role of Scene Graphs in Visual Question Answering
Understanding the Role of Scene Graphs in Visual Question Answering
Vinay Damodaran
Sharanya Chakravarthy
Akshay Kumar
Anjana Umapathy
Teruko Mitamura
Yuta Nakashima
Noa Garcia
Chenhui Chu
GNN
43
32
0
14 Jan 2021
Latent Alignment of Procedural Concepts in Multimodal Recipes
Latent Alignment of Procedural Concepts in Multimodal Recipes
Hossein Rajaby Faghihi
Roshanak Mirzaee
Sudarshan Paliwal
Parisa Kordjamshidi
14
3
0
12 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
225
2,428
0
04 Jan 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjD
VLM
252
157
0
02 Jan 2021
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense
  Generation
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation
Yiran Xing
Z. Shi
Zhao Meng
Gerhard Lakemeyer
Yunpu Ma
Roger Wattenhofer
VLM
59
40
0
02 Jan 2021
Accurate Word Representations with Universal Visual Guidance
Accurate Word Representations with Universal Visual Guidance
Zhuosheng Zhang
Haojie Yu
Hai Zhao
Rui-cang Wang
Masao Utiyama
17
0
0
30 Dec 2020
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual
  Contexts
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Fei Wu
Rui Yan
Jiwei Li
16
28
0
30 Dec 2020
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
  Understanding
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
145
498
0
29 Dec 2020
Detecting Hateful Memes Using a Multimodal Deep Ensemble
Detecting Hateful Memes Using a Multimodal Deep Ensemble
Vlad Sandulescu
VLM
26
43
0
24 Dec 2020
A Multimodal Framework for the Detection of Hateful Memes
A Multimodal Framework for the Detection of Hateful Memes
Phillip Lippe
Nithin Holla
Shantanu Chandra
S. Rajamanickam
Georgios Antoniou
Ekaterina Shutova
H. Yannakoudakis
6
70
0
23 Dec 2020
Seeing past words: Testing the cross-modal capabilities of pretrained
  V&L models on counting tasks
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks
Letitia Parcalabescu
Albert Gatt
Anette Frank
Iacer Calixto
LRM
11
48
0
22 Dec 2020
Object-Centric Diagnosis of Visual Reasoning
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang
Jiayuan Mao
Jiajun Wu
Devi Parikh
David D. Cox
J. Tenenbaum
Chuang Gan
OCL
19
16
0
21 Dec 2020
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
  Knowledge-Based VQA
KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA
Kenneth Marino
Xinlei Chen
Devi Parikh
Abhinav Gupta
Marcus Rohrbach
8
179
0
20 Dec 2020
Transformer Interpretability Beyond Attention Visualization
Transformer Interpretability Beyond Attention Visualization
Hila Chefer
Shir Gur
Lior Wolf
20
641
0
17 Dec 2020
A Closer Look at the Robustness of Vision-and-Language Pre-trained
  Models
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Linjie Li
Zhe Gan
Jingjing Liu
VLM
25
42
0
15 Dec 2020
Attention over learned object embeddings enables complex visual
  reasoning
Attention over learned object embeddings enables complex visual reasoning
David Ding
Felix Hill
Adam Santoro
Malcolm Reynolds
M. Botvinick
OCL
14
69
0
15 Dec 2020
Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes
Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes
Niklas Muennighoff
6
64
0
14 Dec 2020
Knowledge-Routed Visual Question Reasoning: Challenges for Deep
  Representation Embedding
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
44
36
0
14 Dec 2020
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual
  Commonsense Reasoning
KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning
Dandan Song
S. Ma
Zhanchen Sun
Sicheng Yang
L. Liao
SSL
LRM
17
38
0
13 Dec 2020
MiniVLM: A Smaller and Faster Vision-Language Model
MiniVLM: A Smaller and Faster Vision-Language Model
Jianfeng Wang
Xiaowei Hu
Pengchuan Zhang
Xiujun Li
Lijuan Wang
L. Zhang
Jianfeng Gao
Zicheng Liu
VLM
MLLM
27
58
0
13 Dec 2020
Topological Planning with Transformers for Vision-and-Language
  Navigation
Topological Planning with Transformers for Vision-and-Language Navigation
Kevin Chen
Junshen K. Chen
Jo Chuang
Marynel Vázquez
Silvio Savarese
LM&Ro
17
99
0
09 Dec 2020
Hateful Memes Detection via Complementary Visual and Linguistic Networks
Hateful Memes Detection via Complementary Visual and Linguistic Networks
W. Zhang
Guihua Liu
Zhuohua Li
Fuqing Zhu
16
17
0
09 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
23
140
0
08 Dec 2020
StacMR: Scene-Text Aware Cross-Modal Retrieval
StacMR: Scene-Text Aware Cross-Modal Retrieval
Andrés Mafla
Rafael Sampaio de Rezende
Lluís Gómez
Diane Larlus
Dimosthenis Karatzas
3DV
37
14
0
08 Dec 2020
Parameter Efficient Multimodal Transformers for Video Representation
  Learning
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee
Youngjae Yu
Gunhee Kim
Thomas Breuel
Jan Kautz
Yale Song
ViT
13
76
0
08 Dec 2020
Edited Media Understanding Frames: Reasoning About the Intent and Implications of Visual Misinformation
Edited Media Understanding Frames: Reasoning About the Intent and Implications of Visual Misinformation
Jeff Da
Maxwell Forbes
Rowan Zellers
Anthony Zheng
Jena D. Hwang
Antoine Bosselut
Yejin Choi
DiffM
12
12
0
08 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
12
34
0
04 Dec 2020
Understanding Guided Image Captioning Performance across Domains
Understanding Guided Image Captioning Performance across Domains
Edwin G. Ng
Bo Pang
P. Sharma
Radu Soricut
19
24
0
04 Dec 2020
Classification of Multimodal Hate Speech -- The Winning Solution of
  Hateful Memes Challenge
Classification of Multimodal Hate Speech -- The Winning Solution of Hateful Memes Challenge
Xiayu Zhong
12
15
0
02 Dec 2020
Pre-Trained Image Processing Transformer
Pre-Trained Image Processing Transformer
Hanting Chen
Yunhe Wang
Tianyu Guo
Chang Xu
Yiping Deng
Zhenhua Liu
Siwei Ma
Chunjing Xu
Chao Xu
Wen Gao
VLM
ViT
19
1,632
0
01 Dec 2020
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
  of Vision-and-Language BERTs
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
22
119
0
30 Nov 2020
Point and Ask: Incorporating Pointing into Visual Question Answering
Point and Ask: Incorporating Pointing into Visual Question Answering
Arjun Mani
Nobline Yoo
William Fu-Hinthorn
Olga Russakovsky
3DPC
23
37
0
27 Nov 2020
Learning from Lexical Perturbations for Consistent Visual Question
  Answering
Learning from Lexical Perturbations for Consistent Visual Question Answering
Spencer Whitehead
Hui Wu
Yi Ren Fung
Heng Ji
Rogerio Feris
Kate Saenko
29
11
0
26 Nov 2020
A Recurrent Vision-and-Language BERT for Navigation
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong
Qi Wu
Yuankai Qi
Cristian Rodriguez-Opazo
Stephen Gould
LM&Ro
27
292
0
26 Nov 2020
Multimodal Learning for Hateful Memes Detection
Multimodal Learning for Hateful Memes Detection
Yi Zhou
Zhenhao Chen
8
56
0
25 Nov 2020
Interpretable Visual Reasoning via Induced Symbolic Space
Interpretable Visual Reasoning via Induced Symbolic Space
Zhonghao Wang
Kai Wang
Mo Yu
Jinjun Xiong
Wen-mei W. Hwu
M. Hasegawa-Johnson
Humphrey Shi
LRM
OCL
6
19
0
23 Nov 2020
LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular
  Supervision for Visual Question Answering
LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering
Weixin Liang
Fei Niu
Aishwarya N. Reganti
Govind Thattai
Gökhan Tür
26
17
0
21 Nov 2020
Logically Consistent Loss for Visual Question Answering
Logically Consistent Loss for Visual Question Answering
Anh-Cat Le-Ngo
T. Tran
Santu Rana
Sunil R. Gupta
Svetha Venkatesh
OOD
14
0
0
19 Nov 2020
Using Text to Teach Image Retrieval
Using Text to Teach Image Retrieval
Haoyu Dong
Ze Wang
Qiang Qiu
Guillermo Sapiro
3DV
10
4
0
19 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for
  Leveraging Inductive Biases for Vision and Language
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
24
3
0
18 Nov 2020
Previous
123...262728293031
Next