ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.17107
  4. Cited By
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image
  Understanding

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

29 June 2023
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
    VLM
    MLLM
ArXivPDFHTML

Papers citing "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

32 / 32 papers shown
Title
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Weidong Zhang
Mengli Zhu
...
Pengfei Li
C. Li
Junhao Zhu
Hao-Yu Yang
Shiliang Sun
26
1
0
16 Apr 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang
Shaobin Zhuang
Canmiao Fu
Binxin Yang
Ying Zhang
Chong Sun
Zhizheng Zhang
Yali Wang
Chen Li
Zheng-Jun Zha
DiffM
69
1
0
03 Mar 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
77
3
0
26 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
67
10
0
28 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
91
45
0
03 Jan 2025
Chimera: Improving Generalist Model with Domain-Specific Experts
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
M. Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
84
4
0
08 Dec 2024
Scaling Large Motion Models with Million-Level Human Motions
Scaling Large Motion Models with Million-Level Human Motions
Ye Wang
Sipeng Zheng
Bin Cao
Qianshan Wei
Qin Jin
Qin Jin
Zongqing Lu
VGen
40
0
0
04 Oct 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
48
11
0
26 Sep 2024
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow Thinkers
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAG
LRM
74
11
0
16 Aug 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
25
3
0
17 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
30
2
0
17 Jul 2024
Reflective Instruction Tuning: Mitigating Hallucinations in Large
  Vision-Language Models
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang
Teng Wang
Haigang Zhang
Ping Lu
Feng Zheng
MLLM
LRM
VLM
25
3
0
16 Jul 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
39
2
0
28 Jun 2024
Curriculum Learning with Quality-Driven Data Selection
Curriculum Learning with Quality-Driven Data Selection
Biao Wu
Fang Meng
Ling-Hao Chen
19
1
0
27 Jun 2024
On Efficient Language and Vision Assistants for Visually-Situated
  Natural Language Understanding: What Matters in Reading and Reasoning
On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
Geewook Kim
Minjoon Seo
VLM
27
2
0
17 Jun 2024
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
Shenghuan Sun
Gregory M. Goldgof
Alexander Schubert
Zhiqing Sun
Thomas Hartvigsen
A. Butte
Ahmed Alaa
LM&MA
27
4
0
29 May 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Yanjie Wang
Yuliang Liu
Hao Liu
Xiang Bai
Can Huang
32
21
0
20 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
30
27
0
18 May 2024
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Jingqun Tang
Chunhui Lin
Zhen Zhao
Shubo Wei
Binghong Wu
...
Yuliang Liu
Hao Liu
Yuan Xie
Xiang Bai
Can Huang
LRM
VLM
MLLM
61
28
0
19 Apr 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
115
490
0
07 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches
  for Big Models
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
33
16
0
07 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
43
2
0
07 Mar 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
120
106
0
08 Feb 2024
StoryGPT-V: Large Language Models as Consistent Story Visualizers
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen
Mohamed Elhoseiny
VLM
87
10
0
04 Dec 2023
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng
Biao Gong
Di Chen
Yujun Shen
Yu Liu
Jingren Zhou
DiffM
21
43
0
28 Nov 2023
Towards Improving Document Understanding: An Exploration on
  Text-Grounding via MLLMs
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Yonghui Wang
Wen-gang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
50
18
0
22 Nov 2023
UniDoc: A Universal Large Multimodal Model for Simultaneous Text
  Detection, Recognition, Spotting and Understanding
UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Hao Feng
Zijian Wang
Jingqun Tang
Jinghui Lu
Wen-gang Zhou
Houqiang Li
Can Huang
MLLM
VLM
26
46
0
19 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
13
116
0
25 Jul 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
203
883
0
27 Apr 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
207
1,089
0
20 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
1