Title |
---|
![]() MIO: A Foundation Model on Multimodal Tokens Zekun Wang King Zhu Chunpu Xu Wangchunshu Zhou Jiaheng Liu ...Yuanxing Zhang Ge Zhang Ke Xu Jie Fu Wenhao Huang |
![]() EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Yuxuan Zhang Tianheng Cheng Lianghui Zhu Lei Liu Heng Liu Longjin Ran Xiaoxin Chen Xiaoxin Chen Wenyu Liu Xinggang Wang |
![]() DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception Run Luo Yunshui Li Longze Chen Wanwei He Ting-En Lin ...Zikai Song Xiaobo Xia Tongliang Liu Min Yang Binyuan Hui |
![]() Muse: Text-To-Image Generation via Masked Generative Transformers Huiwen Chang Han Zhang Jarred Barber AJ Maschinot José Lezama ...Kevin Patrick Murphy William T. Freeman Michael Rubinstein Yuanzhen Li Dilip Krishnan |