ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.15838
  4. Cited By
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with
  Extensive Diversity
v1v2 (latest)

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

22 July 2024
Yangzhou Liu
Yue Cao
Zhangwei Gao
Weiyun Wang
Zhe Chen
Wenhai Wang
Hao Tian
Lewei Lu
Xizhou Zhu
Tong Lu
Yu Qiao
Jifeng Dai
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (52★)

Papers citing "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity"

30 / 30 papers shown
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
Yunze Man
S. S. Wang
Guowen Zhang
Johan Bjorck
Zhiqi Li
Liang-Yan Gui
Jim Fan
Jan Kautz
Yu Wang
Zhiding Yu
121
0
0
25 Nov 2025
VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models
VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models
Zhicheng Zhang
Weicheng Wang
Yongjie Zhu
Wenyu Qin
Pengfei Wan
Di Zhang
Jufeng Yang
116
0
0
04 Nov 2025
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Suresh Damodaran
Paul D. Rowe
AAML
132
9
0
07 Oct 2025
PATIMT-Bench: A Multi-Scenario Benchmark for Position-Aware Text Image Machine Translation in Large Vision-Language Models
PATIMT-Bench: A Multi-Scenario Benchmark for Position-Aware Text Image Machine Translation in Large Vision-Language Models
Wanru Zhuang
Wenbo Li
Zhibin Lan
Xu Han
P. Li
Jinsong Su
VLM
132
0
0
14 Sep 2025
GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning
GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning
Jianghangfan Zhang
Yibo Yan
Kening Zheng
Xin Zou
Song Dai
Xuming Hu
LRM
264
3
0
06 Aug 2025
QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety
QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety
Taegyeong Lee
Jeonghwa Yoo
Hyoungseo Cho
Soo Yong Kim
Yunho Maeng
AAML
273
2
0
14 Jun 2025
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Gen Luo
Ganlin Yang
Ziyang Gong
Guanzhou Chen
Haonan Duan
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Rongrong Ji
X. Zhu
LM&Ro
203
19
0
30 May 2025
PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
He Zhu
Junyou Su
Minxin Chen
Wen Wang
Yijie Deng
Guanhua Chen
Wenjia Zhang
459
1
0
20 May 2025
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training
Yiran Chen
Yuan Yao
Tong Zhang
Heng Ji
VLM
347
1
0
13 May 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Yang Shi
Jiaheng Liu
Yushuo Guan
Zhikai Wu
Yujiao Shi
...
Bohan Zeng
Wei Zhang
Fuzheng Zhang
Wenjing Yang
Di Zhang
VGenVLM
379
11
0
14 Apr 2025
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?Computer Vision and Pattern Recognition (CVPR), 2025
Yanbo Wang
Jiyang Guan
Jian Liang
Ran He
365
4
0
14 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Wenshu Fan
Qi Wang
Fuzheng Zhang
VLM
381
2
0
10 Apr 2025
MM-IFEngine: Towards Multimodal Instruction Following
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Dahua Lin
Jiaqi Wang
OffRL
506
18
0
10 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Wenshu Fan
Qi Wang
Fuzheng Zhang
MLLMVLM
302
1
0
10 Apr 2025
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
Chaoyi Wang
Baoqing Li
Xinhan Di
MLLMLRM
242
0
0
07 Apr 2025
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang
Changxu Cheng
Lingfeng Wang
Senda Chen
Wuyue Zhao
VLM
329
8
0
17 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
Lawrence Yunliang Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
342
86
0
13 Mar 2025
MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?
MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?
Tianhao Liao
Daoyuan Chen
Zhenqing Ling
Yaliang Li
Ying Shen
LRMReLMSyDa
178
0
0
12 Mar 2025
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang
Quan Cui
Bingchen Zhao
Cheng Yang
MLLMSyDa
438
5
0
11 Mar 2025
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
Feng Ni
Kui Huang
Yao Lu
Wenyu Lv
Guanzhong Wang
Zeyu Chen
Wenshu Fan
VLM
448
2
0
06 Mar 2025
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
Hantao Lou
Changye Li
Yalan Qin
Yaodong Yang
350
6
0
22 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Hui Yuan
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Longji Xu
231
1
0
19 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
328
63
0
28 Jan 2025
Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
Yining Wang
Mi Zhang
Junjie Sun
Chenyue Wang
Min Yang
Hui Xue
Jialing Tao
Ranjie Duan
Qingbin Liu
233
6
0
28 Jan 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
336
14
0
28 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language TasksNeural Information Processing Systems (NeurIPS), 2024
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLMVLMLRM
787
118
0
03 Jan 2025
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
514
179
1
15 Nov 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5%
  Parameters and 90% Performance
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
395
87
0
21 Oct 2024
From Generalist to Specialist: Adapting Vision Language Models via
  Task-Specific Visual Instruction Tuning
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
Yang Bai
Yang Zhou
Jun Zhou
Rick Siow Mong Goh
Daniel Ting
Yong Liu
VLM
227
2
0
09 Oct 2024
Valley: Video Assistant with Large Language model Enhanced abilitY
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
515
252
0
12 Jun 2023
1