Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2402.11690
Cited By
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
18 February 2024
Zhiyang Xu
Chao Feng
Rulin Shao
Trevor Ashby
Ying Shen
dingnan jin
Yu Cheng
Qifan Wang
Lifu Huang
MLLM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning"
43 / 43 papers shown
Title
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
Nilay Naharas
Dang Nguyen
Nesihan Bulut
M. Bateni
Vahab Mirrokni
Baharan Mirzasoleiman
68
0
0
01 Oct 2025
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
Feilong Chen
Y. Liu
Yi Huang
Hao Wang
Miren Tian
Ya-Qi Yu
Minghui Liao
Jihao Wu
MLLM
VLM
187
0
0
15 Sep 2025
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models
Geewook Kim
Minjoon Seo
119
1
0
16 Jun 2025
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
Zhiyang Xu
Jiuhai Chen
Zhaojiang Lin
Xichen Pan
Lifu Huang
...
Di Jin
Michihiro Yasunaga
Lili Yu
Xi Lin
Shaoliang Nie
243
4
0
12 Jun 2025
Revolutionizing Clinical Trials: A Manifesto for AI-Driven Transformation
M. Schaar
Richard W. Peck
E. McKinney
Jim Weatherall
Stuart Bailey
...
Rafik Salama
Christina Gunther
Francesca Frau
Antoine Pugeat
Ramon Hernandez
MedIm
168
0
0
10 Jun 2025
Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection
Shivam Chandhok
Qian Yang
Oscar Manas
Kanishk Jain
Leonid Sigal
Aishwarya Agrawal
159
1
0
01 Jun 2025
Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning
Dayong Liang
Changmeng Zheng
Zhiyuan Wen
Yi Cai
Xiao Wei
Qing Li
LRM
152
0
0
14 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Qi Zhang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
417
16
0
26 Apr 2025
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Dahua Lin
Jiaqi Wang
OffRL
407
15
0
10 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Wenshu Fan
Qi Wang
Fuzheng Zhang
MLLM
VLM
222
1
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Wenshu Fan
Qi Wang
Fuzheng Zhang
VLM
253
2
0
10 Apr 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
312
1
0
28 Mar 2025
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang
Yue Liao
Kang Rong
Fengyun Rao
Yibo Yang
Si Liu
209
0
0
26 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yue Yang
Afshin Dehghan
330
12
0
24 Mar 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
Junlong Li
Xiang Yue
Bo Li
Ping Nie
Dayou Du
Lei Ma
LRM
349
13
0
13 Mar 2025
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2025
Bardia Safaei
Faizan Siddiqui
Jiacong Xu
Vishal M. Patel
Shao-Yuan Lo
VLM
877
7
0
10 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
International Conference on Learning Representations (ICLR), 2025
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
834
8
0
02 Mar 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Zhenyu Liu
Yunxin Li
Baotian Hu
Tong Lu
Yaowei Wang
Min Zhang
241
0
0
27 Feb 2025
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Computer Vision and Pattern Recognition (CVPR), 2024
Jiuhai Chen
Jianwei Yang
Haiping Wu
Dianqi Li
Jianfeng Gao
Tianyi Zhou
Bin Xiao
VLM
213
15
0
05 Dec 2024
On Domain-Adaptive Post-Training for Multimodal Large Language Models
Daixuan Cheng
Shaohan Huang
Ziyu Zhu
Xintong Zhang
Wayne Xin Zhao
Zhongzhi Luan
Bo Dai
Zhenliang Zhang
VLM
315
6
0
29 Nov 2024
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Computer Vision and Pattern Recognition (CVPR), 2024
Vishwesh Nath
Wenqi Li
Dong Yang
Andriy Myronenko
Mingxin Zheng
...
Holger Roth
Daguang Xu
Baris Turkbey
Holger Roth
Daguang Xu
VLM
415
17
0
19 Nov 2024
Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics
Taowen Wang
Cheng Han
James Liang
Wenhao Yang
Dongfang Liu
Luna Xinyu Zhang
Qifan Wang
Jiebo Luo
Ruixiang Tang
AAML
414
22
0
18 Nov 2024
UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Dehai Min
Zhiyang Xu
Guilin Qi
Lifu Huang
Chenyu You
RALM
336
3
0
26 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
International Conference on Learning Representations (ICLR), 2024
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
327
14
0
25 Oct 2024
Adapt-
∞
\infty
∞
: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
International Conference on Learning Representations (ICLR), 2024
A. Maharana
Jaehong Yoon
Tianlong Chen
Joey Tianyi Zhou
198
4
0
14 Oct 2024
How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?
International Conference on Learning Representations (ICLR), 2024
Seongyun Lee
Geewook Kim
Jiyeon Kim
Hyunji Lee
Hoyeon Chang
Sue Hyun Park
Minjoon Seo
186
3
0
10 Oct 2024
M
2
^2
2
PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Taowen Wang
Yiyang Liu
James Liang
Junhan Zhao
Yiming Cui
...
Zenglin Xu
Cheng Han
Lifu Huang
Qifan Wang
Dongfang Liu
MLLM
VLM
LRM
211
30
0
24 Sep 2024
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLM
SyDa
VLM
337
1,489
0
06 Aug 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Yangzhou Liu
Yue Cao
Zhangwei Gao
Weiyun Wang
Zhe Chen
...
Lewei Lu
Xizhou Zhu
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
212
37
0
22 Jul 2024
Advancing Chart Question Answering with Robust Chart Component Recognition
Hanwen Zheng
Sijia Wang
Chris Thomas
Lifu Huang
174
1
0
19 Jul 2024
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
Fengyu Yang
Chao Feng
Daniel Wang
Tianye Wang
Ziyao Zeng
...
Hyoungseob Park
Pengliang Ji
Han Zhao
Yuanning Li
Alex Wong
205
13
0
19 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
235
10
0
11 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Yuan Yao
Heng Ji
LRM
205
25
0
08 Jul 2024
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
Zhiyang Xu
Minqian Liu
Ying Shen
Joy Rimchala
Jiaxin Zhang
Qifan Wang
Yu Cheng
Lifu Huang
VLM
132
7
0
04 Jul 2024
From Efficient Multimodal Models to World Models: A Survey
Xinji Mai
Zeng Tao
Junxiong Lin
Haoran Wang
Yang Chang
Yanlan Kang
Yan Wang
Wenqiang Zhang
205
12
0
27 Jun 2024
On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
Geewook Kim
Minjoon Seo
VLM
151
4
0
17 Jun 2024
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
192
20
0
16 Jun 2024
From Pixels to Prose: A Large Dataset of Dense Image Captions
Vasu Singla
Kaiyu Yue
Sukriti Paul
Reza Shirkavand
Mayuka Jayawardhana
Alireza Ganjdanesh
Heng Huang
A. Bhatele
Gowthami Somepalli
Tom Goldstein
3DV
VLM
213
37
0
14 Jun 2024
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Chunjiang Ge
Sijie Cheng
Xiangqi Jin
Jiale Yuan
Yuan Gao
Jun Song
Shiji Song
Gao Huang
Bo Zheng
MLLM
VLM
154
21
0
24 May 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
262
212
0
22 Apr 2024
Pensieve: Retrospect-then-Compare Mitigates Visual Hallucination
Dingchen Yang
Bowen Cao
Guang Chen
Changjun Jiang
158
13
0
21 Mar 2024
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang
Chao Feng
Ziyang Chen
Hyoungseob Park
Daniel Wang
...
Ziyao Zeng
Xien Chen
Rit Gangopadhyay
Andrew Owens
Alex Wong
215
96
0
31 Jan 2024
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Leilei Gan
Guoyin Wang
LM&MA
509
729
0
21 Aug 2023
1