Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.07490
Cited By
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
20 August 2019
Hao Hao Tan
Mohit Bansal
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LXMERT: Learning Cross-Modality Encoder Representations from Transformers"
50 / 1,506 papers shown
Title
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
15
7
0
05 Oct 2023
Multimodal Question Answering for Unified Information Extraction
Yuxuan Sun
Kai Zhang
Yu-Chuan Su
13
8
0
04 Oct 2023
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
Bruno Souza
Marius Aasan
Hélio Pedrini
Adín Ramirez Rivera
SSL
14
2
0
03 Oct 2023
NEUCORE: Neural Concept Reasoning for Composed Image Retrieval
Shu Zhao
Huijuan Xu
25
6
0
02 Oct 2023
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
Weizhe Lin
Jinghong Chen
Jingbiao Mei
Alexandru Coca
Bill Byrne
19
27
0
29 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
20
18
0
28 Sep 2023
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
Yangyang Guo
Haoyu Zhang
Yongkang Wong
Liqiang Nie
Mohan S. Kankanhalli
VLM
16
3
0
28 Sep 2023
Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback
Teresa Yeo
Oğuzhan Fatih Kar
Zahra Sodagar
Amir Zamir
TTA
OOD
21
3
0
27 Sep 2023
M
3
^{3}
3
3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
14
1
0
26 Sep 2023
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets
Daria Reshetova
Swetava Ganguli
C. V. K. Iyer
Vipul Pandey
18
3
0
26 Sep 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Hila Levi
Guy Heller
Dan Levi
Ethan Fetaya
OCL
VLM
14
3
0
26 Sep 2023
Chop & Learn: Recognizing and Generating Object-State Compositions
Nirat Saini
Hanyu Wang
Archana Swaminathan
Vinoj Jayasundara
Bo He
Kamal Gupta
Abhinav Shrivastava
CoGe
23
12
0
25 Sep 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Z. Yao
Xiaoxia Wu
Conglong Li
Minjia Zhang
Heyang Qi
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
26
11
0
25 Sep 2023
MUTEX: Learning Unified Policies from Multimodal Task Specifications
Rutav Shah
Roberto Martín-Martín
Yuke Zhu
OffRL
44
54
0
25 Sep 2023
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
13
26
0
25 Sep 2023
Survey of Social Bias in Vision-Language Models
Nayeon Lee
Yejin Bang
Holy Lovenia
Samuel Cahyawijaya
Wenliang Dai
Pascale Fung
VLM
36
16
0
24 Sep 2023
GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph
Xin Li
Dongze Lian
Zhihe Lu
Jiawang Bai
Zhibo Chen
Xinchao Wang
VLM
27
60
0
24 Sep 2023
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
21
5
0
23 Sep 2023
Sentence Attention Blocks for Answer Grounding
Seyedalireza Khoshsirat
Chandra Kambhamettu
31
7
0
20 Sep 2023
Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions
Yuxing Long
Xiaoqi Li
Wenzhe Cai
Hao Dong
LLMAG
LM&Ro
13
43
0
20 Sep 2023
The Scenario Refiner: Grounding subjects in images at the morphological level
Claudia Tagliaferri
Sofia Axioti
Albert Gatt
Denis Paperno
19
1
0
20 Sep 2023
Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks
Danae Sánchez Villegas
Daniel Preoctiuc-Pietro
Nikolaos Aletras
31
2
0
14 Sep 2023
PROGrasp: Pragmatic Human-Robot Communication for Object Grasping
Gi-Cheon Kang
Junghyun Kim
Jaein Kim
Byoung-Tak Zhang
19
4
0
14 Sep 2023
VLSlice: Interactive Vision-and-Language Slice Discovery
Eric Slyman
Minsuk Kahng
Stefan Lee
VLM
11
9
0
13 Sep 2023
Sleep Stage Classification Using a Pre-trained Deep Learning Model
Hassan Ardeshir
Mohammad Araghi
15
1
0
12 Sep 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
34
24
0
08 Sep 2023
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
Jiapeng Zhu
Ceyuan Yang
Kecheng Zheng
Yinghao Xu
Zifan Shi
Yujun Shen
MoE
19
8
0
07 Sep 2023
Interpretable Visual Question Answering via Reasoning Supervision
Maria Parelli
Dimitrios Mallis
Markos Diomataris
Vassilis Pitsikalis
LRM
22
2
0
07 Sep 2023
A Multimodal Analysis of Influencer Content on Twitter
Danae Sánchez Villegas
Catalina Goanta
Nikolaos Aletras
15
2
0
06 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
21
2
0
06 Sep 2023
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning
Wei Suo
Mengyang Sun
Weisong Liu
Yi-Meng Gao
Peifeng Wang
Yanning Zhang
Qi Wu
LRM
23
7
0
05 Sep 2023
Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Qiong Wu
Wei Yu
Yiyi Zhou
Shubin Huang
Xiaoshuai Sun
R. Ji
VLM
16
7
0
04 Sep 2023
Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification
Zhiyin Shao
Xinyu Zhang
Changxing Ding
Jian Wang
Jingdong Wang
19
17
0
04 Sep 2023
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models
Cheng Shi
Sibei Yang
VLM
19
21
0
03 Sep 2023
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
A. Kalyan
A. Deshpande
Neeraj Kumar
11
0
0
31 Aug 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang
Z. Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
20
8
0
31 Aug 2023
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
Devaansh Gupta
Siddhant Kharbanda
Jiawei Zhou
Wanhua Li
Hanspeter Pfister
D. Wei
VLM
25
9
0
29 Aug 2023
A Unified Transformer-based Network for multimodal Emotion Recognition
Kamran Ali
Charles E. Hughes
10
1
0
27 Aug 2023
MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification
Hui Ye
Rajshekhar Sunderraman
Shihao Ji
27
3
0
25 Aug 2023
Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation
Yibo Cui
Liang Xie
Yakun Zhang
Meishan Zhang
Ye Yan
Erwei Yin
LM&Ro
29
16
0
24 Aug 2023
CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No
Hualiang Wang
Yi Li
Huifeng Yao
X. Li
VLM
OODD
32
94
0
23 Aug 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
14
3
0
22 Aug 2023
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen
Yao Zhang
Denis Krompass
Jindong Gu
Volker Tresp
FedML
65
39
0
21 Aug 2023
An Examination of the Compositionality of Large Generative Vision-Language Models
Teli Ma
Rong Li
Junwei Liang
CoGe
19
2
0
21 Aug 2023
Generic Attention-model Explainability by Weighted Relevance Accumulation
Yiming Huang
Ao Jia
Xiaodan Zhang
Jiawei Zhang
18
1
0
20 Aug 2023
Whether you can locate or not? Interactive Referring Expression Generation
Fulong Ye
Yuxing Long
Fangxiang Feng
Xiaojie Wang
19
4
0
19 Aug 2023
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
Navid Rajabi
Jana Kosecka
VLM
16
11
0
18 Aug 2023
Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
Ye-Ting Chen
Siyu Zhang
Yaoru Sun
Weijian Liang
Haoran Wang
33
0
0
18 Aug 2023
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
VLM
21
19
0
15 Aug 2023
Cross-Domain Product Representation Learning for Rich-Content E-Commerce
Xuehan Bai
Yan Li
Yong Cheng
Wenjie Yang
Quanming Chen
Han Li
16
3
0
10 Aug 2023
Previous
1
2
3
...
7
8
9
...
29
30
31
Next