Title |
---|
![]() Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Yuxuan Qiao Haodong Duan Xinyu Fang Junming Yang Lin Chen Songyang Zhang Jiaqi Wang Dahua Lin Kai Chen |
![]() MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal
Dataset with One Trillion Tokens Anas Awadalla Le Xue Oscar Lo Manli Shu Hannah Lee ...Silvio Savarese Caiming Xiong Ran Xu Yejin Choi Ludwig Schmidt |
![]() MuirBench: A Comprehensive Benchmark for Robust Multi-image
Understanding Fei Wang Xingyu Fu James Y. Huang Zekun Li Qin Liu ...Kai-Wei Chang Dan Roth Sheng Zhang Hoifung Poon Muhao Chen |
![]() OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
Interleaved with Text Qingyun Li Zhe Chen Weiyun Wang Wenhai Wang Shenglong Ye ...Dahua Lin Yu Qiao Botian Shi Conghui He Jifeng Dai |
![]() LVBench: An Extreme Long Video Understanding Benchmark Weihan Wang Zehai He Wenyi Hong Yean Cheng Xiaohan Zhang ...Shiyu Huang Bin Xu Yuxiao Dong Ming Ding Jie Tang |