ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.07058
  4. Cited By
Evaluation of Deep Convolutional Nets for Document Image Classification
  and Retrieval

Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

25 February 2015
Adam W. Harley
Alex Ufkes
Konstantinos G. Derpanis
ArXivPDFHTML

Papers citing "Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval"

50 / 187 papers shown
Title
Relation-Rich Visual Document Generator for Visual Information Extraction
Relation-Rich Visual Document Generator for Visual Information Extraction
Zi-Han Jiang
Chien-Wei Lin
Wei-Hua Li
Hsuan-Tung Liu
Yi-Ren Yeh
Chu-Song Chen
35
0
0
14 Apr 2025
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang
Tongkun Guan
Pei Fu
Chen Duan
Qianyi Jiang
Zhentao Guo
Shan Guo
Junfeng Luo
Wei-Ming Shen
Xiaokang Yang
MLLM
VLM
71
1
0
18 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei-Ming Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
45
1
0
04 Mar 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
62
0
0
23 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
T. Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLM
AI4TS
114
0
0
14 Feb 2025
Label Errors in the Tobacco3482 Dataset
Label Errors in the Tobacco3482 Dataset
Gordon Lim
Stefan Larson
Kevin Leach
91
0
0
17 Dec 2024
Training LayoutLM from Scratch for Efficient Named-Entity Recognition in
  the Insurance Domain
Training LayoutLM from Scratch for Efficient Named-Entity Recognition in the Insurance Domain
Benno Uthayasooriyar
A. Ly
Franck Vermet
Caio Corro
71
0
0
12 Dec 2024
DocSum: Domain-Adaptive Pre-training for Document Abstractive
  Summarization
DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization
Phan Phuong Mai Chau
Souhail Bakkali
Antoine Doucet
74
0
0
11 Dec 2024
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained
  Visual Document Understanding
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
Fengbin Zhu
Ziyang Liu
Xiang Yao Ng
Haohui Wu
Luu Anh Tuan
Fuli Feng
Chao Wang
Huanbo Luan
Tat-Seng Chua
VLM
35
3
0
25 Oct 2024
"What is the value of {templates}?" Rethinking Document Information
  Extraction Datasets for LLMs
"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs
Ran Zmigrod
Pranav Shetty
Mathieu Sibue
Zhiqiang Ma
Armineh Nourbakhsh
Xiaomo Liu
Manuela Veloso
28
0
0
20 Oct 2024
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse
  Synthetic Data and Global-to-Local Adaptive Perception
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Zhiyuan Zhao
Hengrui Kang
Bin Wang
Conghui He
35
11
0
16 Oct 2024
Towards an Improved Metric for Evaluating Disentangled Representations
Towards an Improved Metric for Evaluating Disentangled Representations
Sahib Julka
Yashu Wang
Michael Granitzer
34
0
0
04 Oct 2024
Modeling Layout Reading Order as Ordering Relations for Visually-rich
  Document Understanding
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
Chong Zhang
Yi Tu
Yixi Zhao
Chenshu Yuan
Huan Chen
...
Mingxu Chai
Ya Guo
Huijia Zhu
Qi Zhang
Tao Gui
43
2
0
29 Sep 2024
μgat: Improving Single-Page Document Parsing by Providing Multi-Page
  Context
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context
Fabio Quattrini
Carmine Zaccagnino
Silvia Cascianelli
Laura Righi
Rita Cucchiara
44
1
0
28 Aug 2024
SynthDoc: Bilingual Documents Synthesis for Visual Document
  Understanding
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding
Chuanghao Ding
Xuejing Liu
Wei Tang
Juan Li
Xiaoliang Wang
Rui Zhao
Cam-Tu Nguyen
Fei Tan
28
0
0
27 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
47
6
0
02 Aug 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
48
2
0
17 Jul 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
33
3
0
17 Jul 2024
Hierarchical Multi-modal Transformer for Cross-modal Long Document
  Classification
Hierarchical Multi-modal Transformer for Cross-modal Long Document Classification
Tengfei Liu
Yongli Hu
Junbin Gao
Yanfeng Sun
Baocai Yin
28
0
0
14 Jul 2024
DocXplain: A Novel Model-Agnostic Explainability Method for Document
  Image Classification
DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification
S. Saifullah
S. Agne
Andreas Dengel
Sheraz Ahmed
42
0
0
04 Jul 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yanjie Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
David W. Romero
Hao Liu
Can Huang
50
19
0
02 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
37
15
0
27 Jun 2024
M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine
  Translation
M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation
Benjamin Hsu
Xiaoyu Liu
Huayang Li
Yoshinari Fujinuma
Maria Nadejde
Xing Niu
Yair Kittenplon
Ron Litman
R. Pappagari
52
4
0
12 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
50
2
0
12 Jun 2024
TRINS: Towards Multimodal Language Models that Can Read
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
39
6
0
10 Jun 2024
UnSupDLA: Towards Unsupervised Document Layout Analysis
UnSupDLA: Towards Unsupervised Document Layout Analysis
Talha Uddin Sheikh
Tahira Shehzadi
K. Hashmi
Didier Stricker
Muhammad Zeshan Afzal
34
2
0
10 Jun 2024
Reconstructing training data from document understanding models
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAML
SyDa
49
1
0
05 Jun 2024
XFormParser: A Simple and Effective Multimodal Multilingual
  Semi-structured Form Parser
XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser
Xianfu Cheng
Hang Zhang
Jian Yang
Xiang Li
Weixiao Zhou
...
Fei Liu
Wei Zhang
Tao Sun
Tongliang Li
Zhoujun Li
52
2
0
27 May 2024
Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained
  Form Classification
Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification
Taylor Archibald
Tony R. Martinez
AI4TS
25
0
0
23 May 2024
Multimodal Adaptive Inference for Document Image Classification with
  Anytime Early Exiting
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting
Omar Hamed
Souhail Bakkali
Marie-Francine Moens
Matthew Blaschko
Jordy Van Landeghem
27
1
0
21 May 2024
CICA: Content-Injected Contrastive Alignment for Zero-Shot Document
  Image Classification
CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification
Sankalp Sinha
Muhammad Gul Zain Ali Khan
Talha Uddin Sheikh
Didier Stricker
Muhammad Zeshan Afzal
VLM
19
1
0
06 May 2024
GeoContrastNet: Contrastive Key-Value Edge Learning for
  Language-Agnostic Document Understanding
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding
Nil Biescas
Carlos Boned Riera
Josep Lladós
Sanket Biswas
42
1
0
06 May 2024
CREPE: Coordinate-Aware End-to-End Document Parser
CREPE: Coordinate-Aware End-to-End Document Parser
Yamato Okamoto
Youngmin Baek
Geewook Kim
Ryota Nakao
Donghyun Kim
Moonbin Yim
Seunghyun Park
Bado Lee
35
1
0
01 May 2024
Machine Unlearning for Document Classification
Machine Unlearning for Document Classification
Lei Kang
Mohamed Ali Souibgui
Fei Yang
Lluís Gómez
Ernest Valveny
Dimosthenis Karatzas
MU
AILaw
34
4
0
29 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
43
24
0
10 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for
  Document Understanding
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
37
39
0
08 Apr 2024
BuDDIE: A Business Document Dataset for Multi-task Information
  Extraction
BuDDIE: A Business Document Dataset for Multi-task Information Extraction
Ran Zmigrod
Dongsheng Wang
Mathieu Sibue
Yulong Pei
Petr Babkin
...
Antony Papadimitriou
William Watson
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
27
4
0
05 Apr 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of
  Legibility and Layout Quality in Relation to Prediction Confidence
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
35
1
0
27 Mar 2024
Visually Guided Generative Text-Layout Pre-training for Document
  Intelligence
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Zhiming Mao
Haoli Bai
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
Kam-Fai Wong
32
8
0
25 Mar 2024
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen
Jiaming Zhang
Kunyu Peng
Junwei Zheng
Ruiping Liu
Philip Torr
Rainer Stiefelhagen
OOD
29
5
0
21 Mar 2024
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich
  Document Understanding
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
Masato Fujitake
MLLM
27
15
0
21 Mar 2024
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document
  Understanding
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Anwen Hu
Haiyang Xu
Jiabo Ye
Mingshi Yan
Liang Zhang
...
Chen Li
Ji Zhang
Qin Jin
Fei Huang
Jingren Zhou
VLM
47
105
0
19 Mar 2024
The future of document indexing: GPT and Donut revolutionize table of
  content processing
The future of document indexing: GPT and Donut revolutionize table of content processing
Degaga Wolde Feyisa
Haylemicheal Berihun
Amanuel Zewdu
Mahsa Najimoghadam
Marzieh Zare
29
0
0
12 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive
  Review of Scanned Document Analysis
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
40
12
0
06 Mar 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document
  Understanding with Instructions
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
21
23
0
24 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document
  understanding
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
22
52
0
31 Dec 2023
WordScape: a Pipeline to extract multilingual, visually rich Documents
  with Layout Annotations from Web Crawl Data
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data
Maurice Weber
Carlo Siebenschuh
Rory Butler
Anton Alexandrov
Valdemar Thanner
...
Haris Jabbar
Ian Foster
Bo-wen Li
Rick L. Stevens
Ce Zhang
21
4
0
15 Dec 2023
Automatic Recognition of Learning Resource Category in a Digital Library
Automatic Recognition of Learning Resource Category in a Digital Library
S. Banerjee
Debarshi Kumar Sanyal
S. Chattopadhyay
Plaban Kumar Bhowmick
P. Das
15
1
0
28 Nov 2023
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and
  Understanding
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and Understanding
Mahmoud Limam
M. Dhiaf
Yousri Kessentini
23
2
0
20 Nov 2023
ETDPC: A Multimodality Framework for Classifying Pages in Electronic
  Theses and Dissertations
ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations
Muntabir Hasan Choudhury
Lamia Salsabil
William A. Ingram
Edward A. Fox
Jian Wu
27
0
0
07 Nov 2023
1234
Next