ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.10215
  4. Cited By
On the General Value of Evidence, and Bilingual Scene-Text Visual
  Question Answering
v1v2 (latest)

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

Computer Vision and Pattern Recognition (CVPR), 2020
24 February 2020
Xinyu Wang
Yuliang Liu
Chunhua Shen
Chun Chet Ng
Canjie Luo
Lianwen Jin
C. Chan
Anton Van Den Hengel
Liangwei Wang
ArXiv (abs)PDFHTML

Papers citing "On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering"

50 / 74 papers shown
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
Yunze Man
S. S. Wang
Guowen Zhang
Johan Bjorck
Zhiqi Li
Liang-Yan Gui
Jim Fan
Jan Kautz
Yu Wang
Zhiding Yu
179
2
0
25 Nov 2025
MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use
MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Dheeraj Kulshrestha
R. Ramnath
145
0
0
22 Nov 2025
NVIDIA Nemotron Nano V2 VL
NVIDIA Nemotron Nano V2 VL
Nvidia
Amala Sanjay Deshmukh
Kateryna Chumachenko
Tuomas Rintamaki
Matthieu Le
...
Krzysztof Pawelec
Michael Evans
Katherine Luna
Jie Lou
Erick Galinkin
VLM
407
5
0
06 Nov 2025
FineVision: Open Data Is All You Need
FineVision: Open Data Is All You Need
Luis Wiedmann
Orr Zohar
Amir Mahla
Xiaohan Wang
Rui Li
Thibaud Frere
Leandro von Werra
Aritra Roy Gosthipaty
Andrés Marafioti
VLM
245
20
0
20 Oct 2025
Vision Language Models Are Not (Yet) Spelling Correctors
Vision Language Models Are Not (Yet) Spelling Correctors
Junhong Liang
Bojun Zhang
VLM
104
0
0
22 Sep 2025
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
Feilong Chen
Y. Liu
Yi Huang
Hao Wang
Miren Tian
Ya-Qi Yu
Minghui Liao
Jihao Wu
MLLMVLM
464
2
0
15 Sep 2025
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
Somraj Gautam
Abhirama Subramanyam Penamakuri
Abhishek Bhandari
Gaurav Harit
LMTDLRM
334
2
0
24 Aug 2025
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model
Weitai Kang
Weiming Zhuang
Zhizhong Li
Yan Yan
Lingjuan Lyu
177
1
0
11 Aug 2025
Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective
Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective
Zhifei Yang
Gangyan Zeng
Daiqing Wu
Huawen Shen
B. Li
Can Ma
Can Ma
Xiaojun Bi
232
2
0
06 Aug 2025
MLLM-CTBench: A Benchmark for Continual Instruction Tuning with Reasoning Process Diagnosis
MLLM-CTBench: A Benchmark for Continual Instruction Tuning with Reasoning Process Diagnosis
Haiyun Guo
ZhiYan Hou
Yu Chen
Jinghan He
Yandu Sun
Yuzhe Zhou
Shujing Guo
Kuan Zhu
Jinqiao Wang
CLLLRM
192
0
0
31 Jul 2025
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
549
8
0
25 May 2025
One RL to See Them All: Visual Triple Unified Reinforcement Learning
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Yan Ma
Linge Du
Xuyang Shen
Shaoxiang Chen
Pengfei Li
Qibing Ren
Lizhuang Ma
Yuchao Dai
Pengfei Liu
Junjie Yan
OffRLLRM
516
0
0
23 May 2025
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language
Ijazul Haq
Yingjie Zhang
Irfan Ali Khan
380
0
0
15 May 2025
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
Adaptive Markup Language Generation for Contextually-Grounded Visual Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Han Xiao
Yina Xie
Guanxin Tan
Yinghao Chen
R. Hu
...
Shiyang Feng
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
VLM
286
4
0
08 May 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Wenshu Fan
Qi Wang
Fuzheng Zhang
MLLMVLM
389
1
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Wenshu Fan
Qi Wang
Fuzheng Zhang
VLM
406
2
0
10 Apr 2025
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Zining Wang
Tongkun Guan
Pei Fu
Chen Duan
Qianyi Jiang
Zhentao Guo
Shan Guo
Junfeng Luo
Wei Shen
Yunbo Wang
MLLMVLM
315
15
0
18 Mar 2025
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
Feng Ni
Kui Huang
Yao Lu
Wenyu Lv
Guanzhong Wang
Zeyu Chen
Wenshu Fan
VLM
481
2
0
06 Mar 2025
Are Large Vision Language Models Good Game Players?
Are Large Vision Language Models Good Game Players?International Conference on Learning Representations (ICLR), 2025
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLMELMLRM
306
16
0
04 Mar 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language EmbeddingComputer Vision and Pattern Recognition (CVPR), 2024
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLMVLM
591
7
0
20 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Xingtai Lv
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLMVLM
418
3
0
18 Dec 2024
PVC: Progressive Visual Token Compression for Unified Image and Video
  Processing in Large Vision-Language Models
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Chenyu Yang
Xuan Dong
X. Zhu
Weijie Su
Jiahao Wang
H. Tian
Zheyu Chen
Wenhai Wang
Lewei Lu
Jifeng Dai
VLM
291
16
0
12 Dec 2024
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Ser-Nam Lim
R. Ramnath
507
7
0
29 Nov 2024
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large
  Language Models on Mobile Devices
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile DevicesComputer Vision and Pattern Recognition (CVPR), 2024
Xudong Lu
Yinghao Chen
Cheng Chen
Hui Tan
Boheng Chen
...
Aojun Zhou
Yafei Wen
Xiaoxin Chen
Shuai Ren
Jiaming Song
237
22
0
16 Nov 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Sunil Aryal
Imran Razzak
Hakim Hacid
263
0
0
30 Oct 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLMVLMLRM
343
127
0
17 Sep 2024
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Weitai Kang
Mengxue Qu
Yunchao Wei
Yan Yan
377
8
0
03 Jul 2024
Visual Grounding with Attention-Driven Constraint Balancing
Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang
Luowei Zhou
Junyi Wu
Changchang Sun
Yan Yan
322
10
0
03 Jul 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual
  Grounding
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang
Gaowen Liu
Mubarak Shah
Yan Yan
ObjD
462
22
0
03 Jul 2024
Tri-VQA: Triangular Reasoning Medical Visual Question Answering for
  Multi-Attribute Analysis
Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis
Lin Fan
Xun Gong
Cenyang Zheng
Yafei Ou
226
3
0
21 Jun 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
  Interleaved with Text
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLMOffRL
331
54
0
12 Jun 2024
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded TextInternational Conference on Learning Representations (ICLR), 2024
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
Bang Liu
Yoshua Bengio
337
5
0
10 Jun 2024
The Evolution of Multimodal Model Architectures
The Evolution of Multimodal Model Architectures
S. Wadekar
Abhishek Chaurasia
Vasu Sharma
Eugenio Culurciello
405
29
0
28 May 2024
Exploring the Capabilities of Large Multimodal Models on Dense Text
Exploring the Capabilities of Large Multimodal Models on Dense TextIEEE International Conference on Document Analysis and Recognition (ICDAR), 2024
Shuo Zhang
Biao Yang
Zhang Li
Zhiyin Ma
Yuliang Liu
Xiang Bai
VLM
242
13
0
09 May 2024
Instruction Makes a Difference
Instruction Makes a Difference
Tosin Adewumi
Nudrat Habib
Lama Alkhaled
Elisa Barney
VLMMLLM
357
2
0
01 Feb 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via
  Multi-modal Feature Synchronizer
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Jiaming Song
Yu Qiao
Jifeng Dai
AuLLM
304
73
0
18 Jan 2024
ModaVerse: Efficiently Transforming Modalities with LLMs
ModaVerse: Efficiently Transforming Modalities with LLMsComputer Vision and Pattern Recognition (CVPR), 2024
Xinyu Wang
Bohan Zhuang
Qi Wu
284
24
0
12 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
785
2,575
0
21 Dec 2023
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual
  Question Answering
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Jian Tang
227
1
0
20 Dec 2023
Monkey: Image Resolution and Text Label Are Important Things for Large
  Multi-modal Models
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Zhang Li
Biao Yang
Qiang Liu
Zhiyin Ma
Shuo Zhang
Jingxu Yang
Yabo Sun
Yuliang Liu
Xiang Bai
MLLM
621
417
0
11 Nov 2023
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA
Sheng Zhou
Dan Guo
Jia Li
Xun Yang
Ming Wang
331
24
0
13 Oct 2023
Separate and Locate: Rethink the Text in Text-based Visual Question
  Answering
Separate and Locate: Rethink the Text in Text-based Visual Question AnsweringACM Multimedia (ACM MM), 2023
Chengyang Fang
Jiangnan Li
Liang Li
Can Ma
Dayong Hu
367
19
0
31 Aug 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual
  Questions
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual QuestionsAAAI Conference on Artificial Intelligence (AAAI), 2023
Wenbo Hu
Y. Xu
Jian Wang
W. Li
Zhe Chen
Zhuowen Tu
MLLMVLM
458
201
0
19 Aug 2023
Advancing Visual Grounding with Scene Knowledge: Benchmark and Method
Advancing Visual Grounding with Scene Knowledge: Benchmark and MethodComputer Vision and Pattern Recognition (CVPR), 2023
Zhihong Chen
Ruifei Zhang
Yibing Song
Xiang Wan
Guanbin Li
257
32
0
21 Jul 2023
On the Hidden Mystery of OCR in Large Multimodal Models
On the Hidden Mystery of OCR in Large Multimodal ModelsScience China Information Sciences (Sci China Inf Sci), 2023
Yuliang Liu
Zhang Li
Mingxin Huang
Chunyuan Li
Dezhi Peng
Mingyu Liu
Lianwen Jin
Xiang Bai
VLMMLLM
502
117
0
13 May 2023
MPMQA: Multimodal Question Answering on Product Manuals
MPMQA: Multimodal Question Answering on Product ManualsAAAI Conference on Artificial Intelligence (AAAI), 2023
Liangfu Zhang
Anwen Hu
Jing Zhang
Shuo Hu
Qin Jin
231
15
0
19 Apr 2023
PDFVQA: A New Dataset for Real-World VQA on PDF Documents
PDFVQA: A New Dataset for Real-World VQA on PDF Documents
Yihao Ding
Siwen Luo
Hyunsuk Chung
S. Han
488
30
0
13 Apr 2023
Fully and Weakly Supervised Referring Expression Segmentation with
  End-to-End Learning
Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning
Hui Li
Mingjie Sun
Jimin Xiao
Eng Gee Lim
Yao-Min Zhao
249
30
0
17 Dec 2022
Hierarchical multimodal transformers for Multi-Page DocVQA
Hierarchical multimodal transformers for Multi-Page DocVQAPattern Recognition (Pattern Recogn.), 2022
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
296
102
0
07 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
VLG: General Video Recognition with Web Textual KnowledgeInternational Journal of Computer Vision (IJCV), 2022
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
377
4
0
03 Dec 2022
12
Next
Page 1 of 2