Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.17647
Cited By
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
29 November 2023
Xiujun Li
Yujie Lu
Zhe Gan
Jianfeng Gao
William Yang Wang
Yejin Choi
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?"
3 / 3 papers shown
Title
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
154
576
0
06 Apr 2023
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
148
259
0
07 Oct 2022
1