Title |
---|
![]() MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
Instruction-Tuning Dataset for LVLMs Ziyu Liu Tao Chu Yuhang Zang Xilin Wei Xiaoyi Dong ...Zijian Liang Yuanjun Xiong Yu Qiao Dahua Lin Jiaqi Wang |
![]() What If We Recaption Billions of Web Images with LLaMA-3? Xianhang Li Haoqin Tu Mude Hui Zeyu Wang Bingchen Zhao ...Jieru Mei Qing Liu Huangjie Zheng Yuyin Zhou Cihang Xie |
![]() OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
Interleaved with Text Qingyun Li Zhe Chen Weiyun Wang Wenhai Wang Shenglong Ye ...Dahua Lin Yu Qiao Botian Shi Conghui He Jifeng Dai |
![]() Dense Connector for MLLMs Huanjin Yao Wenhao Wu Taojiannan Yang Yuxin Song Mengxi Zhang Haocheng Feng Yifan Sun Zhiheng Li Wanli Ouyang Jingdong Wang |