ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.08965
  4. Cited By
Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling

Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling

20 August 2021
Xiaopeng Lu
Zhenhua Fan
Yansen Wang
Jean Oh
Carolyn Rose
ArXivPDFHTML

Papers citing "Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling"

15 / 15 papers shown
Title
Multimodal Misinformation Detection by Learning from Synthetic Data with
  Multimodal LLMs
Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs
Fengzhu Zeng
Wenqian Li
Wei Gao
Yan Pang
34
2
0
29 Sep 2024
Adversarial Training with OCR Modality Perturbation for Scene-Text
  Visual Question Answering
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
Zhixuan Shen
Haonan Luo
Sijia Li
Tianrui Li
19
0
0
14 Mar 2024
Multiple-Question Multiple-Answer Text-VQA
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
44
5
0
15 Nov 2023
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Sheng Zhou
Dan Guo
Jia Li
Xun Yang
M. Wang
8
5
0
13 Oct 2023
Separate and Locate: Rethink the Text in Text-based Visual Question
  Answering
Separate and Locate: Rethink the Text in Text-based Visual Question Answering
Chengyang Fang
Jiangnan Li
Liang Li
Can Ma
Dayong Hu
9
12
0
31 Aug 2023
Making the V in Text-VQA Matter
Making the V in Text-VQA Matter
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
CoGe
20
4
0
01 Aug 2023
DocFormerv2: Local Features for Document Understanding
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
19
39
0
02 Jun 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for
  Scene-Text VQA
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Z. Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
16
6
0
04 Apr 2023
Towards Models that Can See and Read
Towards Models that Can See and Read
Roy Ganz
Oren Nuriel
Aviad Aberdam
Yair Kittenplon
Shai Mazor
Ron Litman
14
13
0
18 Jan 2023
SceneGATE: Scene-Graph based co-Attention networks for TExt visual
  question answering
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
11
4
0
16 Dec 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question
  Answering
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
27
20
0
21 Sep 2022
PreSTU: Pre-Training for Scene-Text Understanding
PreSTU: Pre-Training for Scene-Text Understanding
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei-Lun Chao
Radu Soricut
VLM
125
29
0
12 Sep 2022
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Jun Wang
M. Gao
Yuqian Hu
Ramprasaath R. Selvaraju
Chetan Ramaiah
Ran Xu
J. JáJá
Larry S. Davis
ViT
12
17
0
03 Aug 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
19
100
0
23 Dec 2021
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in
  Natural Images
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
175
515
0
26 Jan 2016
1