Title |
---|
![]() Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks Mengzhao Jia Wenhao Yu Kaixin Ma Tianqing Fang Zhihan Zhang Siru Ouyang Hongming Zhang Meng-Long Jiang Dong Yu |
![]() MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang Mingfei Gao Zhe Gan Philipp Dufter Nina Wenzel ...Haoxuan You Zirui Wang Afshin Dehghan Peter Grasch Yinfei Yang |
![]() MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal
Dataset with One Trillion Tokens Anas Awadalla Le Xue Oscar Lo Manli Shu Hannah Lee ...Silvio Savarese Caiming Xiong Ran Xu Yejin Choi Ludwig Schmidt |