ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.05153
  4. Cited By
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

9 December 2020
Qi Zhu
Chenyu Gao
Peng Wang
Qi Wu
ArXivPDFHTML

Papers citing "Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps"

19 / 19 papers shown
Title
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Sheng Zhou
Dan Guo
Jia Li
Xun Yang
M. Wang
18
5
0
13 Oct 2023
Image-Text Pre-Training for Logo Recognition
Image-Text Pre-Training for Logo Recognition
Mark Hubenthal
Suren Kumar
VLM
32
3
0
18 Sep 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for
  Scene-Text VQA
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Z. Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
21
6
0
04 Apr 2023
DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based Image Captioning
DEVICE: Depth and Visual Concepts Aware Transformer for OCR-based Image Captioning
Dongsheng Xu
Qingbao Huang
Shuang Feng
Yiru Cai
Feng Shuang
Yi Cai
ViT
VLM
20
1
0
03 Feb 2023
SceneGATE: Scene-Graph based co-Attention networks for TExt visual
  question answering
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
20
4
0
16 Dec 2022
Text-Aware Dual Routing Network for Visual Question Answering
Text-Aware Dual Routing Network for Visual Question Answering
Luoqian Jiang
Yifan He
Jian Chen
16
0
0
17 Nov 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question
  Answering
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
33
21
0
21 Sep 2022
MUST-VQA: MUltilingual Scene-text VQA
MUST-VQA: MUltilingual Scene-text VQA
Emanuele Vivoli
Ali Furkan Biten
Andrés Mafla
Dimosthenis Karatzas
Lluís Gómez
34
6
0
14 Sep 2022
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Jun Wang
M. Gao
Yuqian Hu
Ramprasaath R. Selvaraju
Chetan Ramaiah
Ran Xu
J. JáJá
Larry S. Davis
ViT
19
17
0
03 Aug 2022
One for All: One-stage Referring Expression Comprehension with Dynamic
  Reasoning
One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning
Zhipeng Zhang
Zhimin Wei
Zhongzhen Huang
Rui Niu
Peng Wang
ObjD
LRM
9
9
0
31 Jul 2022
Towards Multimodal Vision-Language Models Generating Non-Generic Text
Towards Multimodal Vision-Language Models Generating Non-Generic Text
Wes Robbins
Zanyar Zohourianshahzadi
Jugal Kalita
14
1
0
09 Jul 2022
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
Mengjun Cheng
Yipeng Sun
Long Wang
Xiongwei Zhu
Kun Yao
...
Guoli Song
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
22
60
0
31 Mar 2022
Towards Escaping from Language Bias and OCR Error: Semantics-Centered
  Text Visual Question Answering
Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering
Chengyang Fang
Gangyan Zeng
Yu Zhou
Daiqing Wu
Can Ma
Dayong Hu
Weiping Wang
4
8
0
24 Mar 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
26
100
0
23 Dec 2021
ICDAR 2021 Competition on Document VisualQuestion Answering
ICDAR 2021 Competition on Document VisualQuestion Answering
Rubèn Pérez Tito
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
35
23
0
10 Nov 2021
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling
Xiaopeng Lu
Zhenhua Fan
Yansen Wang
Jean Oh
Carolyn Rose
21
27
0
20 Aug 2021
Question-controlled Text-aware Image Captioning
Question-controlled Text-aware Image Captioning
Anwen Hu
Shizhe Chen
Qin Jin
19
15
0
04 Aug 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
55
254
0
14 Jul 2021
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in
  Natural Images
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
188
515
0
26 Jan 2016
1