Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.14295
Cited By
Focus Anywhere for Fine-grained Multi-page Document Understanding
23 May 2024
Chenglong Liu
Haoran Wei
Jinyue Chen
Lingyu Kong
Zheng Ge
Zining Zhu
Liang Zhao
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Focus Anywhere for Fine-grained Multi-page Document Understanding"
13 / 13 papers shown
Title
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling
Siqi Li
Yufan Shen
Xiangnan Chen
Jiayi Chen
Hengwei Ju
...
Licheng Wen
Botian Shi
Y. Liu
Xinyu Cai
Yu Qiao
VLM
ELM
89
0
0
30 Apr 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLM
OffRL
LRM
37
0
0
10 Apr 2025
\Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents
Ilia Karmanov
A. Deshmukh
Lukas Voegtle
Philipp Fischer
Kateryna Chumachenko
...
Jarno Seppänen
Jupinder Parmar
Joseph Jennings
Andrew Tao
Karan Sapra
68
0
0
06 Feb 2025
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Linke Ouyang
Yuan Qu
Hongbin Zhou
Jiawei Zhu
Rui Zhang
...
Chao Xu
Bo Zhang
Botian Shi
Zhongying Tu
Conghui He
99
5
0
10 Dec 2024
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
40
9
0
29 Aug 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
75
31
0
24 Jun 2024
Small Language Model Meets with Reinforced Vision Vocabulary
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
En Yu
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
VLM
57
40
0
23 Jan 2024
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
Jinrong Yang
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
MLLM
VLM
66
74
0
11 Dec 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Ji Zhang
Qin Jin
Liang He
Xin Lin
Feiyan Huang
VLM
MLLM
121
84
0
08 Oct 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
206
900
0
27 Apr 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
248
1,073
0
05 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
308
11,915
0
04 Mar 2022
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
185
515
0
26 Jan 2016
1