Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.04974
Cited By
Towards Visual Text Grounding of Multimodal Large Language Model
7 April 2025
Ming Li
Ruiyi Zhang
Jian Chen
Jiuxiang Gu
Yufan Zhou
Franck Dernoncourt
Wanrong Zhu
Tianyi Zhou
Tong Sun
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Visual Text Grounding of Multimodal Large Language Model"
2 / 2 papers shown
Title
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
Yijun Liang
Ming Li
Chenrui Fan
Ziyue Li
Dang Nguyen
Kwesi Cobbina
Shweta Bhardwaj
Jiuhai Chen
Fuxiao Liu
Tianyi Zhou
VLM
CoGe
39
0
0
10 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
X. Wang
Z. Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
VLM
LRM
59
1
0
10 Apr 2025
1