Title |
---|
![]() MIO: A Foundation Model on Multimodal Tokens Zekun Wang King Zhu Chunpu Xu Wangchunshu Zhou Jiaheng Liu ...Yuanxing Zhang Ge Zhang Ke Xu Jie Fu Wenhao Huang |
![]() End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting Yongqi Wang Xinxiao Wu Shuo Yang Jiebo Luo |
![]() PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm Haoyi Zhu Honghui Yang Xiaoyang Wu Di Huang Sha Zhang ...Hengshuang Zhao Chunhua Shen Yu Qiao Tong He Wanli Ouyang |
![]() AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Avamarie Brueggeman Andrea Madotto Zhaojiang Lin Tushar Nagarajan Matt Smith ...Peyman Heidari Yue Liu Kavya Srinet Babak Damavandi Anuj Kumar |