Title |
---|
![]() Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks Mengzhao Jia Wenhao Yu Kaixin Ma Tianqing Fang Zhihan Zhang Siru Ouyang Hongming Zhang Meng-Long Jiang Dong Yu |
![]() MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Haotian Zhang Mingfei Gao Zhe Gan Philipp Dufter Nina Wenzel ...Haoxuan You Zirui Wang Afshin Dehghan Peter Grasch Yinfei Yang |