v1v2v3v4 (latest)

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

11 April 2017

Yin Li

Papers citing "Learning Two-Branch Neural Networks for Image-Text Matching Tasks"

50 / 189 papers shown

Annotating Satellite Images of Forests with Keywords from a Specialized Corpus in the Context of Change DetectionInternational Conference on Content-Based Multimedia Indexing (CBMI), 2023

Nathalie Neptune

Josiane Mothe

102

16 Sep 2025

Visual Grounding from Event Cameras

170

11 Sep 2025

Prototype-Aware Multimodal Alignment for Open-Vocabulary Visual Grounding

206

08 Sep 2025

LLaVA-RE: Binary Image-Text Relevancy Evaluation with Multimodal Large Language Model

124

07 Aug 2025

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

384

23 Jul 2025

Audio-3DVG: Unified Audio -- Point Cloud Fusion for 3D Visual Grounding

287

01 Jul 2025

AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations

627

10 Apr 2025

ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval

1.2K

21 Feb 2025

Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

1.1K

28 Dec 2024

Linguistics-Vision Monotonic Consistent Network for Sign Language ProductionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

378

22 Dec 2024

Joint Top-Down and Bottom-Up Frameworks for 3D Visual GroundingInternational Conference on Pattern Recognition (ICPR), 2024

Yang Liu

Daizong Liu

Wei Hu

3DPC

438

21 Oct 2024

ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual GroundingACM Multimedia (MM), 2024

Minghang Zheng

Jiahua Zhang

Qingchao Chen

Yuxin Peng

Yang Liu

ObjD

342

29 Aug 2024

Language-driven Grasp Detection with Mask-guided AttentionIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024

Ngan Le

Anh Nguyen

235

29 Jul 2024

Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval

365

17 Jul 2024

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding

Weitai Kang

Gaowen Liu

Mubarak Shah

Yan Yan

ObjD

463

03 Jul 2024

FILS: Self-Supervised Video Feature Prediction In Semantic Language Space

Mona Ahmadian

Frank Guerin

Andrew Gilbert

355

05 Jun 2024

Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching

Xuri Ge

Joemon M. Jose

262

05 Jun 2024

3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

324

26 Apr 2024

N-Modal Contrastive Losses with Applications to Social Media Data in Trimodal Space

William Theisen

Walter J. Scheirer

259

18 Mar 2024

LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival

251

16 Mar 2024

REPAIR: Rank Correlation and Noisy Pair Half-replacing with Memory for Noisy CorrespondenceIEEE transactions on multimedia (IEEE TMM), 2024

201

13 Mar 2024

How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding

257

29 Feb 2024

Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal DistillationChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023

347

29 Dec 2023

Context Disentangling and Prototype Inheriting for Robust Visual Grounding

Wei Tang

305

19 Dec 2023

Weakly-Supervised 3D Visual Grounding based on Visual Language AlignmentIEEE transactions on multimedia (IEEE TMM), 2023

632

15 Dec 2023

Negative Pre-aware for Noisy Cross-modal MatchingAAAI Conference on Artificial Intelligence (AAAI), 2023

Xu-Yao Zhang

Hao Li

Mang Ye

391

10 Dec 2023

GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models

Haicheng Liao

Chengzhong Xu

310

06 Dec 2023

Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative GroundingInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

Junshi Huang

353

02 Nov 2023

RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open EnvironmentsNeural Information Processing Systems (NeurIPS), 2023

Jingkuan Song

273

26 Oct 2023

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Jiayi Ji

428

17 Oct 2023

Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision

Xiangtai Li

301

23 Jul 2023

Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingIEEE Transactions on Image Processing (IEEE TIP), 2023

Liang Wang

281

15 Jun 2023

"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image CaptioningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Abisek Rajakumar Kalarani

290

01 Jun 2023

Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

239

25 May 2023

Click-Feedback Retrieval

Zeyu Wang

Yuehua Wu

329

28 Apr 2023

BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity ConsistencyComputer Vision and Pattern Recognition (CVPR), 2023

Yang You

314

22 Mar 2023

Scene Graph Based Fusion Network For Image-Text RetrievalIEEE International Conference on Multimedia and Expo (ICME), 2023

Guoliang Wang

Yanlei Shang

Yongzhe Chen

209

20 Mar 2023

Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening

Min Zhang

247

14 Mar 2023

LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval

220

06 Feb 2023

Open-vocabulary Object Segmentation with Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023

384

12 Jan 2023

Universal Multimodal Representation for Language UnderstandingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Rui Wang

324

09 Jan 2023

Learning Multimodal Data Augmentation in Feature SpaceInternational Conference on Learning Representations (ICLR), 2022

Anshumali Shrivastava

A. Wilson

299

29 Dec 2022

Multimodal Query-guided Object Localization

276

01 Dec 2022

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and GroundingAAAI Conference on Artificial Intelligence (AAAI), 2022

Hang Su

Jun Zhu

Lei Zhang

ObjD

450

28 Nov 2022

SLAN: Self-Locator Aided Network for Cross-Modal Understanding

Ming-Ming Cheng

194

28 Nov 2022

Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual GroundingNeural Information Processing Systems (NeurIPS), 2022

246

25 Nov 2022

YORO -- Lightweight End to End Visual Grounding

281

15 Nov 2022

Cross-modal Semantic Enhanced Interaction for Image-Sentence RetrievalIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

257

17 Oct 2022

ConTra: (Con)text (Tra)nsformer for Cross-Modal Video RetrievalAsian Conference on Computer Vision (ACCV), 2022

A. Fragomeni

Michael Wray

Dima Damen

CLIP ViT

180

09 Oct 2022

Learning to embed semantic similarity for joint image-text retrievalIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

Noam Malali

Y. Keller

252

07 Oct 2022