v1v2v3v4v5 (latest)

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

20 May 2024

Mohamad Fitri Faiz Bin Mahmood

Papers citing "MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering"

32 / 82 papers shown

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Jiabo Ye

...

Ji Zhang

Qin Jin

Fei Huang

Jingren Zhou

VLM

327

204

19 Mar 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

...

Chengqi Deng

473

658

08 Mar 2024

Yi: Open Foundation Models by 01.AI

...

842

773

07 Mar 2024

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

Yuliang Liu

318

153

07 Mar 2024

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Weijie Su

...

Ping Luo

Yu Qiao

649

2,249

21 Dec 2023

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text RecognizerComputer Vision and Pattern Recognition (CVPR), 2023

Yuan Xie

409

22 Nov 2023

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

Hao Feng

Qi Liu

345

20 Nov 2023

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

Nasim Shakouri Mahmoudabadi

Lijuan Wang

LM&MA

371

831

29 Sep 2023

Qwen Technical Report

Jinze Bai

Shuai Bai

Yunfei Chu

Zeyu Cui

Kai Dang

...

Zhenru Zhang

Chang Zhou

Jingren Zhou

Xiaohuan Zhou

Tianhang Zhu

OSLM

840

3,143

28 Sep 2023

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across LanguagesInternational Conference on Learning Representations (ICLR), 2023

Jinyi Hu

...

Yankai Lin

Jiao Xue

Dahai Li

Zhiyuan Liu

Maosong Sun

MLLM VLM

279

23 Aug 2023

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

Hao Feng

392

19 Aug 2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Jiabo Ye

...

Ji Zhang

246

157

04 Jul 2023

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Jiuxiang Gu

Diyi Yang

320

289

29 Jun 2023

Visual Instruction TuningNeural Information Processing Systems (NeurIPS), 2023

1.2K

7,615

17 Apr 2023

GPT-4 Technical Report

...

4.7K

21,366

15 Mar 2023

MUST-VQA: MUltilingual Scene-text VQA

Emanuele Vivoli

258

14 Sep 2022

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical ReasoningFindings (Findings), 2022

459

1,149

19 Mar 2022

Visually Grounded Reasoning across Languages and Cultures

Siva Reddy

483

202

28 Sep 2021

xGQA: Cross-Lingual Visual Question Answering

362

13 Sep 2021

Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge DistillationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

H. Khan

D. Gupta

Asif Ekbal

166

10 Sep 2021

Human-Adversarial Visual Question AnsweringNeural Information Processing Systems (NeurIPS), 2021

Sasha Sheng

Amanpreet Singh

Vedanuj Goswami

Jose Alberto Lopez Magana

Wojciech Galuba

Devi Parikh

Douwe Kiela

OOD EgoV AAML

128

04 Jun 2021

InfographicVQAIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021

381

370

26 Apr 2021

DocVQA: A Dataset for VQA on Document Images

Minesh Mathew

Dimosthenis Karatzas

C. V. Jawahar

741

1,129

01 Jul 2020

Large-Scale Adversarial Training for Vision-and-Language Representation LearningNeural Information Processing Systems (NeurIPS), 2020

373

537

11 Jun 2020

ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019IEEE International Conference on Document Analysis and Recognition (ICDAR), 2019

Nibal Nayef

Yash J. Patel

M. Busta

Pinaki Nath Chowdhury

...

259

274

01 Jul 2019

OK-VQA: A Visual Question Answering Benchmark Requiring External KnowledgeComputer Vision and Pattern Recognition (CVPR), 2019

682

1,392

31 May 2019

Scene Text Visual Question AnsweringIEEE International Conference on Computer Vision (ICCV), 2019

453

450

31 May 2019

Towards VQA Models That Can Read

Amanpreet Singh

Devi Parikh

638

1,723

18 Apr 2019

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

...

Fei-Fei Li

2.0K

6,245

23 Feb 2016

Visual7W: Grounded Question Answering in Images

Yuke Zhu

Oliver Groth

Michael S. Bernstein

Li Fei-Fei

534

966

11 Nov 2015

Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question AnsweringNeural Information Processing Systems (NeurIPS), 2015

Jie Zhou

328

521

21 May 2015

VQA: Visual Question Answering

Devi Parikh

1.0K

6,128

03 May 2015