An Empirical Study of Frame Selection for Text-to-Video RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Self-Chained Image-Language Model for Video Localization and Question
AnsweringNeural Information Processing Systems (NeurIPS), 2023 |
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video
Temporal GroundingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |