Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.12146
Cited By
Spatially Aware Multimodal Transformers for TextVQA
23 July 2020
Yash Kant
Dhruv Batra
Peter Anderson
A. Schwing
Devi Parikh
Jiasen Lu
Harsh Agrawal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Spatially Aware Multimodal Transformers for TextVQA"
12 / 12 papers shown
Title
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
21
3
0
21 Sep 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Z. Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
16
6
0
04 Apr 2023
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
16
4
0
16 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
37
0
0
03 Dec 2022
Fusing Modalities by Multiplexed Graph Neural Networks for Outcome Prediction in Tuberculosis
N. S. D'Souza
Hongzhi Wang
Andrea Giovannini
A. Foncubierta-Rodríguez
Kristen L. Beck
Orest Boyko
T. Syeda-Mahmood
AI4CE
24
8
0
25 Oct 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
27
20
0
21 Sep 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
22
100
0
23 Dec 2021
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Xiaopeng Lu
Zhenhua Fan
Yansen Wang
Jean Oh
Carolyn Rose
16
27
0
20 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
19
52
0
16 Aug 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang-Dong Yoo
18
26
0
24 Mar 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
926
0
24 Sep 2019
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
261
10,196
0
16 Nov 2016
1