Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.06462
Cited By
v1
v2
v3
v4 (latest)
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
10 June 2024
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
Bang Liu
Yoshua Bengio
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text"
3 / 3 papers shown
Title
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
217
132
1
14 Apr 2025
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Sijia Chen
Yibo Wang
Yi-Feng Wu
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Lijun Zhang
LLMAG
LRM
113
18
0
11 Jun 2024
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
311
574
0
07 Mar 2024
1