VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval

13 February 2023

Papers citing "VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-Modal Information Retrieval"

5 / 5 papers shown

Title
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis Ming Tao Bingkun Bao Hao Tang Changsheng Xu DiffM VLM 58 100 0 30 Jan 2023
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention Fenglin Liu Xian Wu Shen Ge Xuancheng Ren Wei Fan Xu Sun Yuexian Zou VLM 73 12 0 28 Oct 2022
Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval Yan Gong Georgina Cosma 19 11 0 10 Oct 2022
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval Huaishao Luo Lei Ji Ming Zhong Yang Chen Wen Lei Nan Duan Tianrui Li CLIP VLM 309 778 0 18 Apr 2021
A Straightforward Framework For Video Retrieval Using CLIP Jesús Andrés Portillo-Quintero J. C. Ortíz-Bayliss Hugo Terashima-Marín CLIP 316 116 0 24 Feb 2021