Title |
---|
![]() EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model Yuxuan Zhang Tianheng Cheng Lianghui Zhu Lei Liu Heng Liu Longjin Ran Xiaoxin Chen Xiaoxin Chen Wenyu Liu Xinggang Wang |
![]() FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion Zehan Wang Ziang Zhang Xize Cheng Rongjie Huang Luping Liu ...Haifeng Huang Yang Zhao Tao Jin Peng Gao Zhou Zhao |
![]() Siamese Vision Transformers are Scalable Audio-visual Learners Yan-Bo Lin Gedas Bertasius |