Title |
---|
![]() Aria: An Open Multimodal Native Mixture-of-Experts Model Dongxu Li Yudong Liu Haoning Wu Yue Wang Zhiqi Shen ...Lihuan Zhang Hanshu Yan Guoyin Wang Bei Chen Junnan Li |
![]() Probing Mechanical Reasoning in Large Vision Language Models Haoran Sun Qingying Gao Haiyun Lyu Dezhi Luo Yijiang Li Hokin Deng |
![]() Vision Language Models See What You Want but not What You See Qingying Gao Yijiang Li Haiyun Lyu Haoran Sun Dezhi Luo Hokin Deng |
![]() EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Yuxuan Zhang Tianheng Cheng Lianghui Zhu Lei Liu Heng Liu Longjin Ran Xiaoxin Chen Xiaoxin Chen Wenyu Liu Xinggang Wang |