ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07490
  4. Cited By
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

20 August 2019
Hao Hao Tan
Mohit Bansal
    VLM
    MLLM
ArXivPDFHTML

Papers citing "LXMERT: Learning Cross-Modality Encoder Representations from Transformers"

50 / 1,506 papers shown
Title
Multimodal Representation Learning by Alternating Unimodal Adaptation
Multimodal Representation Learning by Alternating Unimodal Adaptation
Xiaohui Zhang
Jaehong Yoon
Mohit Bansal
Huaxiu Yao
24
21
0
17 Nov 2023
DRESS: Instructing Large Vision-Language Models to Align and Interact
  with Humans via Natural Language Feedback
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
24
56
0
16 Nov 2023
Attribute Diversity Determines the Systematicity Gap in VQA
Attribute Diversity Determines the Systematicity Gap in VQA
Ian Berlot-Attwell
Kumar Krishna Agrawal
A. M. Carrell
Yash Sharma
Naomi Saphra
21
1
0
15 Nov 2023
Interaction is all You Need? A Study of Robots Ability to Understand and
  Execute
Interaction is all You Need? A Study of Robots Ability to Understand and Execute
Kushal Koshti
Nidhir Bhavsar
45
1
0
13 Nov 2023
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
Cheng Yang
Rui Xu
Ye Guo
Peixiang Huang
Yiru Chen
Wenkui Ding
Zhongyuan Wang
Hong Zhou
LRM
8
5
0
09 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural
  Language
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
27
2
0
08 Nov 2023
LRM: Large Reconstruction Model for Single Image to 3D
LRM: Large Reconstruction Model for Single Image to 3D
Yicong Hong
Kai Zhang
Jiuxiang Gu
Sai Bi
Yang Zhou
Difan Liu
Feng Liu
Kalyan Sunkavalli
Trung Bui
Hao Tan
3DV
3DH
40
411
0
08 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task
  Completion
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
24
7
0
07 Nov 2023
Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI
Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI
Yaoxian Song
Penglei Sun
Haoyu Liu
Li Zhixu
Wei Song
Yanghua Xiao
Xiaofang Zhou
LM&Ro
51
12
0
07 Nov 2023
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
Xuzhe Dang
Stefan Edelkamp
35
4
0
06 Nov 2023
MetaReVision: Meta-Learning with Retrieval for Visually Grounded
  Compositional Concept Acquisition
MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition
Guangyue Xu
Parisa Kordjamshidi
Joyce Chai
11
2
0
02 Nov 2023
Integrating Language-Derived Appearance Elements with Visual Cues in
  Pedestrian Detection
Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection
Sungjune Park
Hyunjun Kim
Y. Ro
37
11
0
02 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
35
36
0
01 Nov 2023
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain
  Data
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data
Antonis Antoniades
Yiyi Yu
Joseph Canzano
William Wang
Spencer L. Smith
AI4CE
40
11
0
31 Oct 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIP
VLM
VGen
33
2
0
30 Oct 2023
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Generating Context-Aware Natural Answers for Questions in 3D Scenes
Mohammed Munzer Dwedari
Matthias Niessner
Dave Zhenyu Chen
22
1
0
30 Oct 2023
This Looks Like Those: Illuminating Prototypical Concepts Using Multiple
  Visualizations
This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations
Chiyu Ma
Brandon Zhao
Chaofan Chen
Cynthia Rudin
18
26
0
28 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan L. Yuille
CoGe
19
12
0
27 Oct 2023
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural
  Languages
ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages
Mohammad Akbari
Saeed Ranjbar Alvar
Behnam Kamranian
Amin Banitalebi-Dehkordi
Yong Zhang
AI4CE
20
0
0
26 Oct 2023
Evaluating Bias and Fairness in Gender-Neutral Pretrained
  Vision-and-Language Models
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models
Laura Cabello
Emanuele Bugliarello
Stephanie Brandl
Desmond Elliott
23
7
0
26 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
21
0
0
25 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
16
9
0
25 Oct 2023
$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual
  $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal
  $\mathbb{GR}$aphs
VD\mathbb{VD}VD-GR\mathbb{GR}GR: Boosting V\mathbb{V}Visual D\mathbb{D}Dialog with Cascaded Spatial-Temporal Multi-Modal GR\mathbb{GR}GRaphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
19
3
0
25 Oct 2023
Emergent Communication in Interactive Sketch Question Answering
Emergent Communication in Interactive Sketch Question Answering
Zixing Lei
Yiming Zhang
Yuxin Xiong
Siheng Chen
32
2
0
24 Oct 2023
Multimodal Representations for Teacher-Guided Compositional Visual
  Reasoning
Multimodal Representations for Teacher-Guided Compositional Visual Reasoning
Wafa Aissa
Marin Ferecatu
M. Crucianu
LRM
13
0
0
24 Oct 2023
Visually Grounded Continual Language Learning with Selective
  Specialization
Visually Grounded Continual Language Learning with Selective Specialization
Kyra Ahrens
Lennart Bengtson
Jae Hee Lee
Stefan Wermter
16
0
0
24 Oct 2023
LXMERT Model Compression for Visual Question Answering
LXMERT Model Compression for Visual Question Answering
Maryam Hashemi
Ghazaleh Mahmoudi
Sara Kodeiri
Hadi Sheikhi
Sauleh Eetemadi
VLM
11
4
0
23 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
34
46
0
23 Oct 2023
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained
  Multimodal Models
The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models
Xinyi Chen
Raquel Fernández
Sandro Pezzelle
VLM
13
9
0
23 Oct 2023
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
ITEm: Unsupervised Image-Text Embedding Learning for eCommerce
Baohao Liao
Michael Kozielski
Sanjika Hewavitharana
Jiangbo Yuan
Shahram Khadivi
Tomer Lancewicki
SSL
13
0
0
22 Oct 2023
Semantic and Expressive Variation in Image Captions Across Languages
Semantic and Expressive Variation in Image Captions Across Languages
Andre Ye
Sebastin Santy
Jena D. Hwang
Amy X. Zhang
Ranjay Krishna
VLM
46
3
0
22 Oct 2023
Large Language Models and Multimodal Retrieval for Visual Word Sense
  Disambiguation
Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation
Anastasia Kritharoula
Maria Lymperaiou
Giorgos Stamou
17
4
0
21 Oct 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network
  for VL Representation
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
17
0
0
20 Oct 2023
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question
  Answering
RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering
Yuduo Wang
Pedram Ghamisi
11
4
0
19 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of
  Multi-modal Large Models
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
14
5
0
17 Oct 2023
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo
Guangzhi Wang
Mohan S. Kankanhalli
19
2
0
16 Oct 2023
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
Yitong Jiang
Zhaoyang Zhang
Tianfan Xue
Jinwei Gu
DiffM
32
43
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
22
1
0
15 Oct 2023
Overview of ImageArg-2023: The First Shared Task in Multimodal Argument
  Mining
Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining
Zhexiong Liu
Mohamed Elarby
Yang Zhong
Diane Litman
11
11
0
15 Oct 2023
Penetrative AI: Making LLMs Comprehend the Physical World
Penetrative AI: Making LLMs Comprehend the Physical World
Huatao Xu
Liying Han
Qirui Yang
Mo Li
Mani Srivastava
10
52
0
14 Oct 2023
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
JM3D & JM3D-LLM: Elevating 3D Understanding with Joint Multi-modal Cues
Jiayi Ji
Haowei Wang
Changli Wu
Yiwei Ma
Xiaoshuai Sun
Rongrong Ji
32
1
0
14 Oct 2023
Question Answering for Electronic Health Records: A Scoping Review of
  datasets and models
Question Answering for Electronic Health Records: A Scoping Review of datasets and models
Jayetri Bardhan
Kirk Roberts
Daisy Zhe Wang
21
0
0
12 Oct 2023
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided
  Image Editing
DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu
Kang Zhao
Bo Peng
Yue Jiang
Yingya Zhang
Jing Dong
17
2
0
12 Oct 2023
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan
Xinzhe Han
Shuhui Wang
Qingming Huang
22
0
0
12 Oct 2023
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jieting Long
Zewei Shi
Penghao Jiang
Yidong Gan
20
0
0
11 Oct 2023
MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer
  for Document Question Answering
MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering
Nianlong Gu
Yingqiang Gao
Richard H. R. Hahnloser
RALM
30
0
0
10 Oct 2023
Video-Teller: Enhancing Cross-Modal Generation with Fusion and
  Decoupling
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
Haogeng Liu
Qihang Fan
Tingkai Liu
Linjie Yang
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
VGen
21
12
0
08 Oct 2023
Understanding the Robustness of Multi-modal Contrastive Learning to
  Distribution Shift
Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift
Yihao Xue
Siddharth Joshi
Dang Nguyen
Baharan Mirzasoleiman
VLM
24
4
0
08 Oct 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
35
25
0
07 Oct 2023
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via
  Pre-trained Models
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Ziyi Yin
Muchao Ye
Tianrong Zhang
Tianyu Du
Jinguo Zhu
Han Liu
Jinghui Chen
Ting Wang
Fenglong Ma
AAML
VLM
CoGe
28
36
0
07 Oct 2023
Previous
123...678...293031
Next