Pix2seq: A Language Modeling Framework for Object DetectionInternational Conference on Learning Representations (ICLR), 2021 |
Natural Language Video Localization with Learnable Moment ProposalsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 |
COVR: A test-bed for Visually Grounded Compositional Generalization with
real imagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 |