ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.00231
  4. Cited By
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of
  Large Vision-Language Models

Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

1 March 2024
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
ArXivPDFHTML

Papers citing "Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models"

43 / 43 papers shown
Title
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Y. Chen
Hao Peng
Tong Zhang
Heng Ji
VLM
18
0
0
13 May 2025
Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation
Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation
J. Park
Maanas Taneja
Qianwen Wang
Dongyeop Kang
VGen
65
0
0
26 Apr 2025
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Zheng Liu
Mengjie Liu
J. Chen
Jingwei Xu
Bin Cui
Conghui He
Wentao Zhang
MLLM
57
0
0
14 Apr 2025
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Hardy Chen
Haoqin Tu
Fali Wang
Hui Liu
X. Tang
Xinya Du
Yuyin Zhou
Cihang Xie
ReLM
VLM
OffRL
LRM
63
7
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Y. Liu
Qi Wang
Fuzheng Zhang
VLM
53
1
0
10 Apr 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Yichao Yan
VLM
42
1
0
26 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Y. Yang
Afshin Dehghan
51
1
0
24 Mar 2025
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
Yang Tian
Zheng Lu
Mingqi Gao
Zheng Liu
Bo Zhao
LRM
42
0
0
21 Mar 2025
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving
Jian Zhang
Zhiyuan Wang
Z. Wang
Xinyu Zhang
Fangzhi Xu
Qika Lin
Rui Mao
Erik Cambria
Jun Liu
LLMAG
54
1
0
21 Mar 2025
Federated Continual Instruction Tuning
Federated Continual Instruction Tuning
Haiyang Guo
Fanhu Zeng
Fei Zhu
Wenzhuo Liu
Da-Han Wang
Jian Xu
Xu-Yao Zhang
Cheng-Lin Liu
CLL
FedML
63
1
0
17 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
M. Keuper
Hideki Tanaka
Masao Utiyama
Raj Dabre
Steffen Eger
Simone Paolo Ponzetto
50
0
0
14 Mar 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
82
3
0
26 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Z. Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Y. Wang
44
0
0
19 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
119
9
0
05 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
70
10
0
28 Jan 2025
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
M. Zhang
114
7
0
22 Dec 2024
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
H. Wang
Yuxiang Nie
Yongjie Ye
Deng GuanYu
Yanjie Wang
Shuai Li
Haiyang Yu
Jinghui Lu
Can Huang
VLM
MLLM
79
1
0
12 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
M. Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
86
4
0
08 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
94
5
0
05 Dec 2024
ScImage: How Good Are Multimodal Large Language Models at Scientific
  Text-to-Image Generation?
ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
Leixin Zhang
Steffen Eger
Yinjie Cheng
Weihe Zhai
Jonas Belouadi
Christoph Leiter
Simone Paolo Ponzetto
Fahimeh Moafian
Zhixue Zhao
MLLM
76
1
0
03 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision
  Language Models
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
HourVideo: 1-Hour Video-Language Understanding
HourVideo: 1-Hour Video-Language Understanding
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Manling Li
Jiajun Wu
L. Fei-Fei
VLM
41
31
0
07 Nov 2024
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
Jian Chen
R. Zhang
Yufan Zhou
Tong Yu
Franck Dernoncourt
J. Gu
Ryan Rossi
Changyou Chen
Tong Sun
34
0
0
02 Nov 2024
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
  and Generation
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu
Xiaokang Chen
Z. F. Wu
Yiyang Ma
Xingchao Liu
...
Wen Liu
Zhenda Xie
Xingkai Yu
Chong Ruan
Ping Luo
AI4TS
52
72
0
17 Oct 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
S. Yu
C. Tang
Bokai Xu
Junbo Cui
Junhao Ran
...
Zhenghao Liu
Shuo Wang
Xu Han
Zhiyuan Liu
Maosong Sun
VLM
37
22
0
14 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Zhihan Zhang
Siru Ouyang
Hongming Zhang
Meng-Long Jiang
Dong Yu
VLM
29
5
0
02 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
36
32
1
30 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Boxin Wang
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Wei Ping
MLLM
VLM
LRM
40
51
0
17 Sep 2024
$VILA^2$: VILA Augmented VILA
VILA2VILA^2VILA2: VILA Augmented VILA
Yunhao Fang
Ligeng Zhu
Yao Lu
Yan Wang
Pavlo Molchanov
Jang Hyun Cho
Marco Pavone
Song Han
Hongxu Yin
VLM
39
7
0
24 Jul 2024
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Shraman Pramanick
Rama Chellappa
Subhashini Venugopalan
43
13
0
12 Jul 2024
ColPali: Efficient Document Retrieval with Vision Language Models
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
60
21
0
27 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
37
278
0
24 Jun 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for
  Training and Testing Multi-modal Large Language Models
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Renqiu Xia
Song Mao
Xiangchao Yan
Hongbin Zhou
Bo Zhang
...
Yongwei Wang
Bin Wang
Junchi Yan
Fei Wu
Yu Qiao
40
10
0
17 Jun 2024
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal
  Dataset with One Trillion Tokens
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
Anas Awadalla
Le Xue
Oscar Lo
Manli Shu
Hannah Lee
...
Silvio Savarese
Caiming Xiong
Ran Xu
Yejin Choi
Ludwig Schmidt
67
24
0
17 Jun 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
L. Wang
Rong Jin
34
40
0
12 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
33
6
0
05 Jun 2024
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language
  Model
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model
Kezhen Chen
Rahul Thapa
Rahul Chalamala
Ben Athiwaratkun
S. Song
James Y. Zou
VLM
50
5
0
03 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of
  Multi-modal LLMs in Video Analysis
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Tong Bill Xu
Xiawu Zheng
Enhong Chen
Rongrong Ji
Xing Sun
VLM
MLLM
48
297
0
31 May 2024
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure
  Interpretation
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
Jonathan Roberts
Kai Han
N. Houlsby
Samuel Albanie
40
12
0
14 May 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in
  the Era of Large Foundation Models
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Kung-Hsiang Huang
Hou Pong Chan
Yi Ren Fung
Haoyi Qiu
Mingyang Zhou
Shafiq R. Joty
Shih-Fu Chang
Heng Ji
AI4TS
64
14
0
18 Mar 2024
Silkie: Preference Distillation for Large Visual Language Models
Silkie: Preference Distillation for Large Visual Language Models
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
Liang Chen
Yazheng Yang
Benyou Wang
Lingpeng Kong
MLLM
101
68
0
17 Dec 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
209
1,101
0
20 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
1