ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.12985
  4. Cited By
OCR-IDL: OCR Annotations for Industry Document Library Dataset

OCR-IDL: OCR Annotations for Industry Document Library Dataset

25 February 2022
Ali Furkan Biten
Rubèn Pérez Tito
Lluís Gómez
Ernest Valveny
Dimosthenis Karatzas
ArXivPDFHTML

Papers citing "OCR-IDL: OCR Annotations for Industry Document Library Dataset"

6 / 6 papers shown
Title
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
82
3
0
26 Feb 2025
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
33
3
0
17 Jul 2024
Graph Neural Networks and Representation Embedding for Table Extraction
  in PDF Documents
Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents
Andrea Gemelli
Emanuele Vivoli
S. Marinai
LMTD
16
9
0
23 Aug 2022
DocEnTr: An End-to-End Document Image Enhancement Transformer
DocEnTr: An End-to-End Document Image Enhancement Transformer
Mohamed Ali Souibgui
Sanket Biswas
Sana Khamekhem Jemni
Yousri Kessentini
Alicia Fornés
Josep Lladós
Umapada Pal
ViT
45
45
0
25 Jan 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
LaTr: Layout-Aware Transformer for Scene-Text VQA
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
22
100
0
23 Dec 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
  Understanding
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
145
498
0
29 Dec 2020
1