ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.07490
  4. Cited By
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

20 August 2019
Hao Hao Tan
Mohit Bansal
    VLM
    MLLM
ArXivPDFHTML

Papers citing "LXMERT: Learning Cross-Modality Encoder Representations from Transformers"

50 / 1,506 papers shown
Title
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Xiangxiang Chu
Limeng Qiao
Xinyu Zhang
Shuang Xu
Fei Wei
...
Xiaofei Sun
Yiming Hu
Xinyang Lin
Bo-Wen Zhang
Chunhua Shen
VLM
MLLM
17
94
0
06 Feb 2024
Convincing Rationales for Visual Question Answering Reasoning
Convincing Rationales for Visual Question Answering Reasoning
Kun Li
G. Vosselman
Michael Ying Yang
34
1
0
06 Feb 2024
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual
  Question Answering
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Ziyu Ma
Shutao Li
Bin Sun
Jianfei Cai
Zuxiang Long
Fuyan Ma
21
1
0
04 Feb 2024
MLIP: Enhancing Medical Visual Representation with Divergence Encoder
  and Knowledge-guided Contrastive Learning
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li
Laurence T. Yang
Bocheng Ren
Xin Nie
Zhangyang Gao
Cheng Tan
Stan Z. Li
VLM
10
11
0
03 Feb 2024
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models
  for Spatial Proximity Analysis
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
Jianing Li
Xi Nan
Ming Lu
Li Du
Shanghang Zhang
40
1
0
31 Jan 2024
Beyond Image-Text Matching: Verb Understanding in Multimodal
  Transformers Using Guided Masking
Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking
Ivana Beňová
Jana Kosecka
Michal Gregor
Martin Tamajka
Marcel Veselý
Marián Simko
21
1
0
29 Jan 2024
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Jorge Sánchez
Rodrigo Laguna
VLM
21
0
0
29 Jan 2024
Dynamic Transformer Architecture for Continual Learning of Multimodal
  Tasks
Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks
Yuliang Cai
Mohammad Rostami
33
4
0
27 Jan 2024
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other
  Modalities
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang
Xiaohan Ding
Kaixiong Gong
Yixiao Ge
Ying Shan
Xiangyu Yue
ViT
16
7
0
25 Jan 2024
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
Debjyoti Mondal
Suraj Modi
Subhadarshi Panda
Rituraj Singh
Godawari Sudhakar Rao
LRM
23
36
0
23 Jan 2024
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model
  for Multimodal Processing
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Xianghu Yue
Xiaohai Tian
Lu Lu
Malu Zhang
Zhizheng Wu
Haizhou Li
19
0
0
22 Jan 2024
Q&A Prompts: Discovering Rich Visual Clues through Mining
  Question-Answer Prompts for VQA requiring Diverse World Knowledge
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
Haibi Wang
Weifeng Ge
LRM
19
3
0
19 Jan 2024
Generative Multi-Modal Knowledge Retrieval with Large Language Models
Generative Multi-Modal Knowledge Retrieval with Large Language Models
Xinwei Long
Jiali Zeng
Fandong Meng
Zhiyuan Ma
Kaiyan Zhang
Bowen Zhou
Jie Zhou
35
14
0
16 Jan 2024
Uncovering the Full Potential of Visual Grounding Methods in VQA
Uncovering the Full Potential of Visual Grounding Methods in VQA
Daniel Reich
Tanja Schultz
25
4
0
15 Jan 2024
APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning
APLe: Token-Wise Adaptive for Multi-Modal Prompt Learning
Guiming Cao
Kaize Shi
Hong Fu
Huaiwen Zhang
Guandong Xu
VLM
20
1
0
12 Jan 2024
Efficient Vision-and-Language Pre-training with Text-Relevant Image
  Patch Selection
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection
Wei Ye
Chaoya Jiang
Haiyang Xu
Chenhao Ye
Chenliang Li
Mingshi Yan
Shikun Zhang
Songhang Huang
Fei Huang
VLM
29
0
0
11 Jan 2024
ConcEPT: Concept-Enhanced Pre-Training for Language Models
ConcEPT: Concept-Enhanced Pre-Training for Language Models
Xintao Wang
Zhouhong Gu
Jiaqing Liang
Dakuan Lu
Yanghua Xiao
Wei Wang
24
1
0
11 Jan 2024
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual
  Concept Understanding
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding
Yatong Bai
Utsav Garg
Apaar Shanker
Haoming Zhang
Samyak Parajuli
...
Eugenia D Fomitcheva
E. Branson
Aerin Kim
Somayeh Sojoudi
Kyunghyun Cho
16
2
0
09 Jan 2024
UMIE: Unified Multimodal Information Extraction with Instruction Tuning
UMIE: Unified Multimodal Information Extraction with Instruction Tuning
Lin Sun
Kai Zhang
Qingyuan Li
Renze Lou
24
4
0
05 Jan 2024
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for
  Multimodal Alignment
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma
Furong Xu
Jian Liu
Ming Yang
Qingpei Guo
VLM
34
3
0
04 Jan 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
49
4
0
28 Dec 2023
Detection-based Intermediate Supervision for Visual Question Answering
Detection-based Intermediate Supervision for Visual Question Answering
Yuhang Liu
Daowan Peng
Wei Wei
Yuanyuan Fu
Wenfeng Xie
Dangyang Chen
22
2
0
26 Dec 2023
WebVLN: Vision-and-Language Navigation on Websites
WebVLN: Vision-and-Language Navigation on Websites
Qi Chen
D. Pitawela
Chongyang Zhao
Gengze Zhou
Hsiang-Ting Chen
Qi Wu
23
8
0
25 Dec 2023
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language
  Pre-training and Open-Vocabulary Object Detection
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen
Tiancheng Zhao
Mingwei Zhu
Jianwei Yin
VLM
ObjD
65
11
0
22 Dec 2023
Object Attribute Matters in Visual Question Answering
Object Attribute Matters in Visual Question Answering
Peize Li
Q. Si
Peng Fu
Zheng Lin
Yan Wang
25
0
0
20 Dec 2023
Misalign, Contrast then Distill: Rethinking Misalignments in
  Language-Image Pretraining
Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pretraining
Bumsoo Kim
Yeonsik Jo
Jinhyung Kim
S. Kim
VLM
6
6
0
19 Dec 2023
Expediting Contrastive Language-Image Pretraining via Self-distilled
  Encoders
Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders
Bumsoo Kim
Jinhyung Kim
Yeonsik Jo
S. Kim
VLM
11
3
0
19 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
35
3
0
15 Dec 2023
Prompt-based Distribution Alignment for Unsupervised Domain Adaptation
Prompt-based Distribution Alignment for Unsupervised Domain Adaptation
Shuanghao Bai
Min Zhang
Wanqi Zhou
Siteng Huang
Zhirong Luan
Donglin Wang
Badong Chen
OOD
VLM
11
32
0
15 Dec 2023
Guided Image Restoration via Simultaneous Feature and Image Guided
  Fusion
Guided Image Restoration via Simultaneous Feature and Image Guided Fusion
Xinyi Liu
Qian Zhao
Jie-Kai Liang
Huiyu Zeng
Deyu Meng
Lei Zhang
33
0
0
14 Dec 2023
TiMix: Text-aware Image Mixing for Effective Vision-Language
  Pre-training
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang
Wei Ye
Haiyang Xu
Qinghao Ye
Mingshi Yan
Ji Zhang
Shikun Zhang
CLIP
VLM
11
4
0
14 Dec 2023
Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs
  for Embodied AI
Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI
Kai Huang
Boyuan Yang
Wei Gao
24
1
0
13 Dec 2023
MATK: The Meme Analytical Tool Kit
MATK: The Meme Analytical Tool Kit
Ming Shan Hee
Aditi Kumaresan
N. Hoang
Nirmalendu Prakash
Rui Cao
Roy Ka-Wei Lee
VLM
17
2
0
11 Dec 2023
Adventures of Trustworthy Vision-Language Models: A Survey
Adventures of Trustworthy Vision-Language Models: A Survey
Mayank Vatsa
Anubhooti Jain
Richa Singh
17
4
0
07 Dec 2023
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging
  Cross-Modal Attention with Large Language Models
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models
Haicheng Liao
Huanming Shen
Zhenning Li
Chengyue Wang
Guofa Li
Yiming Bie
Chengzhong Xu
34
26
0
06 Dec 2023
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal
  Models
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai
Zirui Song
Dayan Guan
Zhenhao Chen
Xing Luo
Chenyu Yi
Alex C. Kot
MLLM
VLM
12
31
0
05 Dec 2023
SequencePAR: Understanding Pedestrian Attributes via A Sequence
  Generation Paradigm
SequencePAR: Understanding Pedestrian Attributes via A Sequence Generation Paradigm
Jiandong Jin
Xiao Wang
Chenglong Li
Lili Huang
Jin Tang
AI4TS
24
6
0
04 Dec 2023
Expand BERT Representation with Visual Information via Grounded Language
  Learning with Multimodal Partial Alignment
Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment
Cong-Duy Nguyen
The-Anh Vu-Le
Thong Nguyen
Tho Quan
A. Luu
10
5
0
04 Dec 2023
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image
  Captioning
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning
Cong Yang
Zuchao Li
Lefei Zhang
24
22
0
02 Dec 2023
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models
  via fMRI
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI
Xuan-Bac Nguyen
Xin Li
Pawan Sinha
Samee U. Khan
Khoa Luu
ViT
MedIm
22
0
0
30 Nov 2023
Debiasing Multimodal Models via Causal Information Minimization
Debiasing Multimodal Models via Causal Information Minimization
Vaidehi Patil
A. Maharana
Mohit Bansal
CML
22
2
0
28 Nov 2023
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Chancharik Mitra
Brandon Huang
Trevor Darrell
Roei Herzig
MLLM
LRM
26
80
0
27 Nov 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
66
723
0
27 Nov 2023
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Bin Xie
Jiale Cao
Jin Xie
Fahad Shahbaz Khan
Yanwei Pang
VLM
18
42
0
27 Nov 2023
Unified Medical Image Pre-training in Language-Guided Common Semantic
  Space
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
Xiaoxuan He
Yifan Yang
Xinyang Jiang
Xufang Luo
Haoji Hu
Siyun Zhao
Dongsheng Li
Yuqing Yang
Lili Qiu
24
1
0
24 Nov 2023
Boosting the Power of Small Multimodal Reasoning Models to Match Larger
  Models with Self-Consistency Training
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan
Jingxuan Wei
Zhangyang Gao
Linzhuang Sun
Siyuan Li
Ruifeng Guo
Xihong Yang
Stan Z. Li
LRM
14
7
0
23 Nov 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided
  Code-Vision Representation
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision Representation
Yangyi Chen
Xingyao Wang
Manling Li
Derek Hoiem
Heng Ji
25
10
0
22 Nov 2023
Causality is all you need
Causality is all you need
Ning Xu
Yifei Gao
Hongshuo Tian
Yongdong Zhang
An-An Liu
23
0
0
21 Nov 2023
What's left can't be right -- The remaining positional incompetence of
  contrastive vision-language models
What's left can't be right -- The remaining positional incompetence of contrastive vision-language models
Nils Hoehing
Ellen Rushe
Anthony Ventresque
VLM
8
2
0
20 Nov 2023
Understanding and Mitigating Classification Errors Through Interpretable
  Token Patterns
Understanding and Mitigating Classification Errors Through Interpretable Token Patterns
Michael A. Hedderich
Jonas Fischer
Dietrich Klakow
Jilles Vreeken
6
0
0
18 Nov 2023
Previous
123...567...293031
Next