Title |
---|
![]() Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks Mengzhao Jia Wenhao Yu Kaixin Ma Tianqing Fang Zhihan Zhang Siru Ouyang Hongming Zhang Meng-Long Jiang Dong Yu |
![]() ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots Yu-Chung Hsiao Fedir Zubach Maria Wang Jindong Chen Victor Carbune Jason Lin Maria Wang Yun Zhu Jindong Chen |