ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.00923
  4. Cited By
Multimodal grid features and cell pointers for Scene Text Visual
  Question Answering
v1v2 (latest)

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

1 June 2020
Lluís Gómez
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Marçal Rusiñol
Ernest Valveny
Dimosthenis Karatzas
ArXiv (abs)PDFHTML

Papers citing "Multimodal grid features and cell pointers for Scene Text Visual Question Answering"

5 / 5 papers shown
Title
Hierarchical multimodal transformers for Multi-Page DocVQA
Hierarchical multimodal transformers for Multi-Page DocVQA
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
94
61
0
07 Dec 2022
MUST-VQA: MUltilingual Scene-text VQA
MUST-VQA: MUltilingual Scene-text VQA
Emanuele Vivoli
Ali Furkan Biten
Andrés Mafla
Dimosthenis Karatzas
Lluís Gómez
113
6
0
14 Sep 2022
OCR-IDL: OCR Annotations for Industry Document Library Dataset
OCR-IDL: OCR Annotations for Industry Document Library Dataset
Ali Furkan Biten
Rubèn Pérez Tito
Lluís Gómez
Ernest Valveny
Dimosthenis Karatzas
77
30
0
25 Feb 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
127
102
0
23 Dec 2021
DocVQA: A Dataset for VQA on Document Images
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
172
748
0
01 Jul 2020
1