Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.06306
Cited By
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
9 June 2023
Fuxiao Liu
Hao Tan
Chris Tensmeyer
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents"
10 / 10 papers shown
Title
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Shota Onohara
Atsuyuki Miyai
Yuki Imajuku
Kazuki Egashira
Jeonghun Baek
Xiang Yue
Graham Neubig
Kiyoharu Aizawa
OSLM
83
1
0
22 Oct 2024
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Atsuyuki Miyai
Jingkang Yang
Jingyang Zhang
Yifei Ming
Sisir Dhakal
...
Yixuan Li
Hai Li
Ziwei Liu
Toshihiko Yamasaki
Kiyoharu Aizawa
36
9
0
31 Jul 2024
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
Fuxiao Liu
Xiaoyang Wang
Wenlin Yao
Jianshu Chen
Kaiqiang Song
Sangwoo Cho
Yaser Yacoob
Dong Yu
21
99
0
15 Nov 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Tianyi Zhou
VLM
MLLM
23
155
0
23 Oct 2023
Anticipating Driving Behavior through Deep Learning-Based Policy Prediction
Alexander Liu
8
0
0
20 Jul 2023
Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps
Fuxiao Liu
Paiheng Xu
Zongxi Li
Yue Feng
Hyemi Song
13
31
0
11 Jul 2023
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu
Kevin Qinghong Lin
Linjie Li
Jianfeng Wang
Yaser Yacoob
Lijuan Wang
VLM
MLLM
11
239
0
26 Jun 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
388
4,110
0
28 Jan 2022
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjD
VLM
252
157
0
02 Jan 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
145
498
0
29 Dec 2020
1