ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.13194
  4. Cited By
Towards Improving Document Understanding: An Exploration on
  Text-Grounding via MLLMs

Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs

22 November 2023
Yonghui Wang
Wen-gang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
ArXivPDFHTML

Papers citing "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs"

23 / 23 papers shown
Title
Towards Visual Text Grounding of Multimodal Large Language Model
Towards Visual Text Grounding of Multimodal Large Language Model
Ming Li
Ruiyi Zhang
Jian Chen
Jiuxiang Gu
Yufan Zhou
Franck Dernoncourt
Wanrong Zhu
Tianyi Zhou
Tong Sun
26
2
0
07 Apr 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
64
10
0
28 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
83
10
0
06 Jan 2025
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene
  Understanding
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
Yonghui Wang
Wengang Zhou
Hao Feng
Houqiang Li
VLM
20
0
0
30 Aug 2024
Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language
  Models
Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Mingxin Huang
Yuliang Liu
Dingkang Liang
Lianwen Jin
Xiang Bai
29
9
0
04 Aug 2024
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
Ruiyi Zhang
Yufan Zhou
Jian Chen
Jiuxiang Gu
Changyou Chen
Tongfei Sun
VLM
31
6
0
27 Jul 2024
Harmonizing Visual Text Comprehension and Generation
Harmonizing Visual Text Comprehension and Generation
Zhen Zhao
Jingqun Tang
Binghong Wu
Chunhui Lin
Shubo Wei
Hao Liu
Xin Tan
Zhizhong Zhang
Can Huang
Yuan Xie
VLM
26
21
0
23 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
32
15
0
27 Jun 2024
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Jingqun Tang
Chunhui Lin
Zhen Zhao
Shubo Wei
Binghong Wu
...
Yuliang Liu
Hao Liu
Yuan Xie
Xiang Bai
Can Huang
LRM
VLM
MLLM
44
26
0
19 Apr 2024
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
Bozhi Luan
Hao Feng
Hong Chen
Yonghui Wang
Wen-gang Zhou
Houqiang Li
MLLM
21
10
0
15 Apr 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model
  Handling Resolutions from 336 Pixels to 4K HD
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Xingcheng Zhang
Jifeng Dai
Yuxin Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
31
107
0
09 Apr 2024
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
Jiaxing Chen
Yuxuan Liu
Dehu Li
Xiang An
Weimo Deng
Ziyong Feng
Yongle Zhao
Yin Xie
LRM
38
12
0
28 Mar 2024
TextMonkey: An OCR-Free Large Multimodal Model for Understanding
  Document
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
Yuliang Liu
Biao Yang
Qiang Liu
Zhang Li
Zhiyin Ma
Shuo Zhang
Xiang Bai
MLLM
VLM
30
87
0
07 Mar 2024
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large
  Language Model
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
Anwen Hu
Yaya Shi
Haiyang Xu
Jiabo Ye
Qinghao Ye
Mingshi Yan
Chenliang Li
Qi Qian
Ji Zhang
Fei Huang
MLLM
28
15
0
30 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the
  Frequency Domain for Versatile Document Understanding
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
15
33
0
20 Nov 2023
UReader: Universal OCR-free Visually-situated Language Understanding
  with Multimodal Large Language Model
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Ji Zhang
Qin Jin
Liang He
Xin Lin
Feiyan Huang
VLM
MLLM
118
83
0
08 Oct 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
203
883
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
  Understanding
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
137
492
0
29 Dec 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
Guillaume Jaume
H. K. Ekenel
Jean-Philippe Thiran
109
259
0
27 May 2019
1