ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.14974
  4. Cited By
LOVA3: Learning to Visual Question Answering, Asking and Assessment

LOVA3: Learning to Visual Question Answering, Asking and Assessment

21 February 2025
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
ArXivPDFHTML

Papers citing "LOVA3: Learning to Visual Question Answering, Asking and Assessment"

18 / 18 papers shown
Title
GraphicBench: A Planning Benchmark for Graphic Design with Language Agents
GraphicBench: A Planning Benchmark for Graphic Design with Language Agents
Dayeon Ki
Tianyi Zhou
Marine Carpuat
Gang Wu
Puneet Mathur
Viswanathan Swaminathan
LLMAG
LM&Ro
48
0
0
15 Apr 2025
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Yusheng Zhao
Junyu Luo
Xiao Luo
Weizhi Zhang
Zhiping Xiao
Wei Ju
Philip S. Yu
Ming Zhang
AuLLM
27
0
0
03 Apr 2025
FakeReasoning: Towards Generalizable Forgery Detection and Reasoning
FakeReasoning: Towards Generalizable Forgery Detection and Reasoning
Y. Gao
Dongliang Chang
Bingyao Yu
Haotian Qin
Lei Chen
Kongming Liang
Zhanyu Ma
44
0
0
27 Mar 2025
VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models
Jen-Tse Huang
Dasen Dai
Jen-Yuan Huang
Youliang Yuan
Xiaoyuan Liu
Wenxuan Wang
Wenxiang Jiao
Pinjia He
Zhaopeng Tu
LRM
41
0
0
23 Feb 2025
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao
Wenqi Pei
Yifei Tao
Haiyang Mei
Mike Zheng Shou
33
0
0
20 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
64
10
0
28 Jan 2025
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
31
8
0
21 Sep 2024
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
45
34
0
29 Apr 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
68
136
0
29 Apr 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
116
106
0
08 Feb 2024
An Evaluation of GPT-4V and Gemini in Online VQA
An Evaluation of GPT-4V and Gemini in Online VQA
Mengchen Liu
Chongyan Chen
Danna Gurari
MLLM
37
7
0
17 Dec 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
150
280
0
14 Oct 2023
Self-Evaluation Guided Beam Search for Reasoning
Self-Evaluation Guided Beam Search for Reasoning
Yuxi Xie
Kenji Kawaguchi
Yiran Zhao
Xu Zhao
MingSung Kan
Junxian He
Qizhe Xie
LRM
153
128
0
01 May 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Guiding Visual Question Generation
Guiding Visual Question Generation
Nihir Vedd
Zixu Wang
Marek Rei
Yishu Miao
Lucia Specia
50
23
0
15 Oct 2021
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
136
1,403
0
06 Jun 2016
1