Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.10244
Cited By
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
19 March 2022
Ahmed Masry
Do Xuan Long
J. Tan
Shafiq Joty
Enamul Hoque
AIMat
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning"
32 / 82 papers shown
Title
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
S. Yu
C. Tang
Bokai Xu
Junbo Cui
Junhao Ran
...
Zhenghao Liu
Shuo Wang
Xu Han
Zhiyuan Liu
Maosong Sun
VLM
100
30
0
14 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
87
26
0
10 Oct 2024
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye
Haotian Zhang
Erik Daxberger
Lin Chen
Zongyu Lin
...
Haoxuan You
Dan Xu
Zhe Gan
Jiasen Lu
Yinfei Yang
EgoV
MLLM
102
12
0
09 Oct 2024
Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback
Fatemeh Pesaran Zadeh
Juyeon Kim
Jin-Hwa Kim
Gunhee Kim
ALM
75
3
0
05 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
113
29
0
04 Oct 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
98
25
0
26 Sep 2024
EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding
Muye Huang
Han Lai
Xinyu Zhang
Wenjun Wu
Jie Ma
Lingling Zhang
Jun Liu
65
6
0
03 Sep 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Yi-Fan Zhang
Huanyu Zhang
Haochen Tian
Chaoyou Fu
Shuangqing Zhang
...
Qingsong Wen
Zhang Zhang
Liwen Wang
Rong Jin
Tieniu Tan
OffRL
90
44
0
23 Aug 2024
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
Feipeng Ma
Yizhou Zhou
Hebei Li
Zilong He
Siying Wu
Fengyun Rao
Siying Wu
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
113
8
0
21 Aug 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
56
3
0
17 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
97
142
0
16 Jul 2024
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models
Pranshu Pandya
Agney S Talwarr
Vatsal Gupta
Tushar Kataria
Dan Roth
Vivek Gupta
LRM
87
2
0
15 Jul 2024
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Shraman Pramanick
Rama Chellappa
Subhashini Venugopalan
54
16
0
12 Jul 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
Jinghui Lu
Haiyang Yu
Yanjie Wang
Yongjie Ye
Jingqun Tang
...
Qi Liu
Hao Feng
Han Wang
Hao Liu
Can Huang
102
23
0
02 Jul 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Tingting Gao
Xi Li
MoE
87
2
0
28 Jun 2024
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang
Chufan Shi
Yaxin Liu
Bo Shui
Junjie Wang
...
Yuxiang Zhang
Gongye Liu
Xiaomei Nie
Deng Cai
Yujiu Yang
MLLM
LRM
67
25
0
14 Jun 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi-dong Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Yanjie Wang
Yuliang Liu
Hao Liu
Xiang Bai
Can Huang
106
28
0
20 May 2024
An empirical study of LLaMA3 quantization: from LLMs to MLLMs
Wei Huang
Xingyu Zheng
Xudong Ma
Haotong Qin
Chengtao Lv
Hong Chen
Jie Luo
Xiaojuan Qi
Xianglong Liu
Michele Magno
MQ
83
41
0
22 Apr 2024
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Jingqun Tang
Chunhui Lin
Zhen Zhao
Shubo Wei
Binghong Wu
...
Yuliang Liu
Hao Liu
Yuan Xie
Xiang Bai
Can Huang
LRM
VLM
MLLM
104
30
0
19 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
119
36
0
29 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
92
12
0
05 Mar 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
Renqiu Xia
Bo Zhang
Hancheng Ye
Xiangchao Yan
Qi Liu
...
Min Dou
Botian Shi
Junchi Yan
Junchi Yan
Yu Qiao
LRM
88
61
0
19 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
151
112
0
08 Feb 2024
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
Yu-Chung Hsiao
Fedir Zubach
Maria Wang
Jindong Chen
Victor Carbune
Jason Lin
Maria Wang
Yun Zhu
Jindong Chen
RALM
170
27
0
16 Sep 2022
FeTaQA: Free-form Table Question Answering
Linyong Nan
Chia-Hsuan Hsieh
Ziming Mao
Xi Lin
Neha Verma
...
Isabel Trindade
Renusree Bandaru
Jacob Cunningham
Caiming Xiong
Dragomir R. Radev
LMTD
86
150
0
01 Apr 2021
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them
Patrick Lewis
Yuxiang Wu
Linqing Liu
Pasquale Minervini
Heinrich Küttler
Aleksandra Piktus
Pontus Stenetorp
Sebastian Riedel
RALM
86
234
0
13 Feb 2021
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
Jeonghun Baek
Geewook Kim
Junyeop Lee
Sungrae Park
Dongyoon Han
Sangdoo Yun
Seong Joon Oh
Hwalsuk Lee
414
477
0
03 Apr 2019
Analysing Mathematical Reasoning Abilities of Neural Models
D. Saxton
Edward Grefenstette
Felix Hill
Pushmeet Kohli
LRM
125
420
0
02 Apr 2019
DVQA: Understanding Data Visualizations via Question Answering
Kushal Kafle
Brian L. Price
Scott D. Cohen
Christopher Kanan
AIMat
49
379
0
24 Jan 2018
Graph-Structured Representations for Visual Question Answering
Damien Teney
Lingqiao Liu
Anton Van Den Hengel
GNN
NAI
84
419
0
19 Sep 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
251
10,412
0
21 Jul 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
153
8,067
0
16 Jun 2016
Previous
1
2