Progressive Learning for Image Retrieval with Hybrid-Modality QueriesAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022

Yida Zhao

Yuqing Song

Qin Jin

188

24 Apr 2022

Training and challenging models for text-guided fashion image retrieval

Eric Dodds

Jack Culpepper

Gaurav Srivastava

145

23 Apr 2022

Unified Pretraining Framework for Document UnderstandingNeural Information Processing Systems (NeurIPS), 2022

Jiuxiang Gu

272

111

22 Apr 2022

A Multi-level Alignment Training Scheme for Video-and-Language Grounding

Govind Thattai

216

22 Apr 2022

Making the Most of Text Semantics to Improve Biomedical Vision--Language ProcessingEuropean Conference on Computer Vision (ECCV), 2022

Benedikt Boecking

Naoto Usuyama

Shruthi Bannur

Daniel Coelho De Castro

...

486

358

21 Apr 2022

Imagination-Augmented Natural Language UnderstandingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

216

18 Apr 2022

End-to-end Dense Video Captioning as Sequence GenerationInternational Conference on Computational Linguistics (COLING), 2022

216

18 Apr 2022

Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language TasksIEEE Transactions on Image Processing (IEEE TIP), 2022

Liujuan Cao

Yongjian Wu

Feiyue Huang

Rongrong Ji

ViT

153

16 Apr 2022

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal RetrievalComputer Vision and Pattern Recognition (CVPR), 2022

254

15 Apr 2022

Vision-and-Language Pretrained Models: A SurveyInternational Joint Conference on Artificial Intelligence (IJCAI), 2022

422

15 Apr 2022

Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog

164

10 Apr 2022

Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel DataAAAI Conference on Artificial Intelligence (AAAI), 2022

164

10 Apr 2022

Temporal Alignment Networks for Long-term VideoComputer Vision and Pattern Recognition (CVPR), 2022

169

104

06 Apr 2022

SimVQA: Exploring Simulated Environments for Visual Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2022

Paola Cascante-Bonilla

209

31 Mar 2022

ViSTA: Vision and Scene Text Aggregation for Cross-Modal RetrievalComputer Vision and Pattern Recognition (CVPR), 2022

...

Errui Ding

Jingdong Wang

277

31 Mar 2022

TubeDETR: Spatio-Temporal Video Grounding with TransformersComputer Vision and Pattern Recognition (CVPR), 2022

341

121

30 Mar 2022

Image-text Retrieval: A Survey on Recent Research and DevelopmentInternational Joint Conference on Artificial Intelligence (IJCAI), 2022

Min Zhang

336

108

28 Mar 2022

Large-scale Bilingual Language-Image Contrastive Learning

ByungSoo Ko

Geonmo Gu

VLM

257

28 Mar 2022

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)International Conference on Machine Learning (ICML), 2022

Yu Huang

Junyang Lin

Chang Zhou

Hongxia Yang

Longbo Huang

171

144

23 Mar 2022

Local-Global Context Aware Transformer for Language-Guided Video SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

322

100

18 Mar 2022

Deep Unsupervised Hashing with Latent Semantic ComponentsAAAI Conference on Artificial Intelligence (AAAI), 2022

Wenzhe Zhao

Hongfa Wang

238

17 Mar 2022

UNIMO-2: End-to-End Unified Vision-Language Grounded LearningFindings (Findings), 2022

145

17 Mar 2022

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of RedundancyComputer Vision and Pattern Recognition (CVPR), 2022

Tianlong Chen

Zhenyu Zhang

Yu Cheng

Ahmed Hassan Awadallah

Zinan Lin

ViT

256

12 Mar 2022

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval

229

10 Mar 2022

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-explorationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Xiwen Liang

Fengda Zhu

Lingling Li

Hang Xu

Xiaodan Liang

LM&Ro VLM

119

08 Mar 2022

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and SpottingEuropean Conference on Computer Vision (ECCV), 2022

265

08 Mar 2022

Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text RetrievalAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022

Liang Ding

Yibing Zhan

236

08 Mar 2022

Find a Way Forward: a Language-Guided Semantic Map Navigator

Zehao Wang

Tinne Tuytelaars

144

07 Mar 2022

Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

Lei Zhang

204

03 Mar 2022

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular AlignmentComputer Vision and Pattern Recognition (CVPR), 2022

Amanpreet Singh

158

01 Mar 2022

Multi-modal Alignment using Representation CodebookComputer Vision and Pattern Recognition (CVPR), 2022

486

28 Feb 2022

COMPASS: Contrastive Multimodal Pretraining for Autonomous SystemsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022

188

20 Feb 2022

A Survey of Vision-Language Pre-Trained ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2022

396

241

18 Feb 2022

AMS_ADRN at SemEval-2022 Task 5: A Suitable Image-text Multimodal Joint Modeling Method for Multi-task Misogyny IdentificationInternational Workshop on Semantic Evaluation (SemEval), 2022

Da Li

Ming Yi

Yukai He

141

18 Feb 2022