Title |
---|
![]() End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting Yongqi Wang Xinxiao Wu Shuo Yang Jiebo Luo |
![]() NAVERO: Unlocking Fine-Grained Semantics for Video-Language
Compositionality Chaofan Tao Gukyeong Kwon Varad Gunjal Hao Yang Zhaowei Cai Yonatan Dukler Ashwin Swaminathan R. Manmatha Colin Jon Taylor Stefano Soatto |