Video and Text Matching with Conditioned Embeddings

21 October 2021

Lior Wolf

Papers citing "Video and Text Matching with Conditioned Embeddings"

3 / 3 papers shown

Title
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment Yong Ren Chenxing Li Manjie Xu Wei Liang Yu Gu Rilin Chen Dong Yu VGen DiffM 41 6 0 13 Sep 2024
Multi-modal Transformer for Video Retrieval Valentin Gabeur Chen Sun Alahari Karteek Cordelia Schmid ViT 398 532 0 21 Jul 2020
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 141 1,458 0 06 Jun 2016