Title |
---|
![]() NAVERO: Unlocking Fine-Grained Semantics for Video-Language
Compositionality Chaofan Tao Gukyeong Kwon Varad Gunjal Hao Yang Zhaowei Cai Yonatan Dukler Ashwin Swaminathan R. Manmatha Colin Jon Taylor Stefano Soatto |
![]() PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal
Documents Junjie Wang Yin Zhang Yatai Ji Yuxiang Zhang Chunyang Jiang ...Bei Chen Qunshu Lin Minghao Liu Ge Zhang Wenhu Chen |