Title |
---|
![]() Mementos: A Comprehensive Benchmark for Multimodal Large Language Model
Reasoning over Image Sequences Xiyao Wang Yuhang Zhou Xiaoyu Liu Hongjin Lu Yuancheng Xu ...Taixi Lu Gedas Bertasius Mohit Bansal Huaxiu Yao Furong Huang |
![]() A Survey on Image-text Multimodal Models Ruifeng Guo Jingxuan Wei Linzhuang Sun Khai Le-Duc Guiyong Chang Dawei Liu Sibo Zhang Zhengbing Yao Mingjun Xu Liping Bu |
![]() Muse: Text-To-Image Generation via Masked Generative Transformers Huiwen Chang Han Zhang Jarred Barber AJ Maschinot José Lezama ...Kevin Patrick Murphy William T. Freeman Michael Rubinstein Yuanzhen Li Dilip Krishnan |