Corpus Conversion Service: A machine learning platform to ingest
documents at scale [Poster abstract]

Corpus Conversion Service: A machine learning platform to ingest documents at scale [Poster abstract]

15 May 2018

Peter W. J. Staar

Papers citing "Corpus Conversion Service: A machine learning platform to ingest documents at scale [Poster abstract]"

17 / 17 papers shown

Title
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models Wenwen Yu Zhibo Yang Jianqiang Wan Sibo Song J. Tang Wenqing Cheng Y. Liu Xiang Bai 48 1 0 22 Feb 2025
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition Jianqiang Wan Sibo Song Wenwen Yu Yuliang Liu Wenqing Cheng Fei Huang Xiang Bai Cong Yao Zhibo Yang 48 26 0 28 Mar 2024
ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents Christoph Auer A. Nassar Maksym Lysak Michele Dolfi Nikolaos Livathinos Peter W. J. Staar OOD 3DV 27 6 0 24 May 2023
Optimized Table Tokenization for Table Structure Recognition Maksym Lysak Ahmed Nassar Nikolaos Livathinos Christoph Auer Peter W. J. Staar LMTD 25 13 0 05 May 2023
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis B. Pfitzmann Christoph Auer Michele Dolfi A. Nassar Peter W. J. Staar 16 85 0 02 Jun 2022
Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness Christoph Auer Michele Dolfi A. Carvalho Cesar Berrospi Ramis P. W. J. S. I. Research 17 9 0 01 Jun 2022
TableFormer: Table Structure Understanding with Transformers A. Nassar Nikolaos Livathinos Maksym Lysak Peter W. J. Staar LMTD ViT 11 73 0 02 Mar 2022
CoVA: Context-aware Visual Attention for Webpage Information Extraction Anurendra Kumar Keval Morabia Jingjing Wang A. Niekler Martin Potthast 23 11 0 24 Oct 2021
VILA: Improving Structured Content Extraction from Scientific PDFs Using Visual Layout Groups Zejiang Shen Kyle Lo Lucy Lu Wang Bailey Kuehl Daniel S. Weld Doug Downey VLM 16 34 0 01 Jun 2021
Robust PDF Document Conversion Using Recurrent Neural Networks Nikolaos Livathinos Cesar Berrospi Maksym Lysak Viktor Kuropiatnyk Ahmed Nassar A. Carvalho Michele Dolfi Christoph Auer K. Dinkla Peter W. J. Staar 20 22 0 18 Feb 2021
Understanding in Artificial Intelligence S. Maetschke D. M. Iraola Pieter Barnard Elaheh Shafieibavani Peter Zhong Ying Xu Antonio Jimeno Yepes ELM VLM 11 0 0 17 Jan 2021
Extracting Procedural Knowledge from Technical Documents Shivali Agarwal Shubham Atreja V. Agarwal 19 4 0 20 Oct 2020
Cross-Domain Document Object Detection: Benchmark Suite and Method K. Li Curtis Wigington Chris Tensmeyer Handong Zhao Nikolaos Barmpalios Vlad I. Morariu Varun Manjunatha Tong Sun Y. Fu 16 45 0 30 Mar 2020
A Machine Learning Framework for Data Ingestion in Document Images Han Fu Yunyu Bai Zhuo Li Jun Shen Jianling Sun 19 1 0 11 Feb 2020
Image-based table recognition: data, model, and evaluation Xu Zhong Elaheh Shafieibavani Antonio Jimeno Yepes LMTD 16 212 0 25 Nov 2019
Fine-Grained Object Detection over Scientific Document Images with Region Embeddings Ankur Goswami Joshua McGrath S. Peters Theodoros Rekatsinas ObjD 16 3 0 28 Oct 2019
PubLayNet: largest dataset ever for document layout analysis Xu Zhong Jianbin Tang Antonio Jimeno Yepes 13 448 0 16 Aug 2019