ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.02036
  4. Cited By
Learning Semantic Concepts and Order for Image and Sentence Matching

Learning Semantic Concepts and Order for Image and Sentence Matching

6 December 2017
Yan Huang
Qi Wu
Liang Wang
    VLM
ArXiv (abs)PDFHTML

Papers citing "Learning Semantic Concepts and Order for Image and Sentence Matching"

50 / 93 papers shown
Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders
Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders
Faizan Farooq Khan
Vladan Stojnić
Zakaria Laskar
Mohamed Elhoseiny
Giorgos Tolias
DiffMVLM
100
0
0
29 Aug 2025
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
Yang Liu
Wentao Feng
Zhuoyao Liu
Shudong Huang
Jiancheng Lv
DiffMVLM
396
1
0
19 Mar 2025
Cross-Modal Pre-Aligned Method with Global and Local Information for
  Remote-Sensing Image and Text Retrieval
Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text RetrievalIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2024
Zengbao Sun
Ming Zhao
Gaorui Liu
Andre Kaup
254
11
0
22 Nov 2024
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric
  Learning
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric LearningIEEE Transactions on Image Processing (TIP), 2024
Haiwen Diao
Ying Zhang
Shang Gao
Jiawen Zhu
Long Chen
Huchuan Lu
229
9
0
20 Oct 2024
GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with
  Hierarchical Feature Learning
GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning
Huy Hoang Nguyen
An Vuong
Anh Nguyen
Ian Reid
Minh Nhat Vu
Mamba
253
4
0
22 Sep 2024
Towards End-to-End Explainable Facial Action Unit Recognition via
  Vision-Language Joint Learning
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint LearningACM Multimedia (MM), 2024
Yaming Yang
Zhe Wang
Fuhai Chen
Ziyu Guan
Weigang Lu
Joemon M. Jose
CVBM
277
11
0
01 Aug 2024
Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for
  Image-Text Matching
Hire: Hybrid-modal Interaction with Multiple Relational Enhancements for Image-Text Matching
Xuri Ge
Fuhai Chen
Songpei Xu
Fuxiang Tao
Jie Wang
Joemon M. Jose
234
3
0
05 Jun 2024
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing
  Image-Text Retrieval
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval
Rui Yang
Shuang Wang
Yi Han
Yuanheng Li
Dong Zhao
Dou Quan
Yanhe Guo
Licheng Jiao
248
10
0
29 May 2024
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text
  Matching
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching
Haiwen Diao
Ying Zhang
Shang Gao
Xiang Ruan
Huchuan Lu
311
4
0
28 Apr 2024
3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial
  Self-Highlighting
3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting
Xuri Ge
Songpei Xu
Fuhai Chen
Jie Wang
Guoxin Wang
Shan An
Joemon M. Jose
3DPC
299
22
0
26 Apr 2024
CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora
CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora
Zijun Long
Xuri Ge
R. McCreadie
Joemon M. Jose
336
12
0
23 Feb 2024
Active Mining Sample Pair Semantics for Image-text Matching
Active Mining Sample Pair Semantics for Image-text Matching
Yongfeng Chen
Jin Liu
Zhijing Yang
Ruihan Chen
Junpeng Tan
VLM
211
0
0
09 Nov 2023
Vision-Language Dataset Distillation
Vision-Language Dataset Distillation
Xindi Wu
Byron Zhang
Zhiwei Deng
Olga Russakovsky
DDVLM
456
15
0
15 Aug 2023
Improved Probabilistic Image-Text Representations
Improved Probabilistic Image-Text RepresentationsInternational Conference on Learning Representations (ICLR), 2023
Sanghyuk Chun
VLM
604
43
0
29 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future
  Trends
Vision-Language Models in Remote Sensing: Current Progress and Future TrendsIEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
358
165
0
09 May 2023
Learning Bottleneck Concepts in Image Classification
Learning Bottleneck Concepts in Image ClassificationComputer Vision and Pattern Recognition (CVPR), 2023
Bowen Wang
Liangzhi Li
Yuta Nakashima
Hajime Nagahara
SSL
260
66
0
20 Apr 2023
Revisiting Multimodal Representation in Contrastive Learning: From Patch
  and Token Embeddings to Finite Discrete Tokens
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete TokensComputer Vision and Pattern Recognition (CVPR), 2023
Yuxiao Chen
Jianbo Yuan
Yu Tian
Shijie Geng
Xinyu Li
Ding Zhou
Dimitris N. Metaxas
Hongxia Yang
236
50
0
27 Mar 2023
Plug-and-Play Regulators for Image-Text Matching
Plug-and-Play Regulators for Image-Text MatchingIEEE Transactions on Image Processing (IEEE TIP), 2023
Haiwen Diao
Yanzhe Zhang
Wen Liu
Xiang Ruan
Huchuan Lu
222
31
0
23 Mar 2023
LIMITR: Leveraging Local Information for Medical Image-Text
  Representation
LIMITR: Leveraging Local Information for Medical Image-Text RepresentationIEEE International Conference on Computer Vision (ICCV), 2023
Gefen Dawidowicz
Elad Hirsch
A. Tal
191
22
0
21 Mar 2023
Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening
Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening
Min Cao
Yang Bai
Wenwen Qiang
Ziqiang Cao
Liqiang Nie
Min Zhang
203
4
0
14 Mar 2023
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
Zhu Wang
Sourav Medya
Sathya Ravi
VLM
251
1
0
11 Feb 2023
HierVL: Learning Hierarchical Video-Language Embeddings
HierVL: Learning Hierarchical Video-Language EmbeddingsComputer Vision and Pattern Recognition (CVPR), 2023
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLMAI4TS
447
72
0
05 Jan 2023
Scale-Semantic Joint Decoupling Network for Image-text Retrieval in
  Remote Sensing
Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing
Chengyu Zheng
Ning Song
Ruoyu Zhang
Lei Huang
Zhiqiang Wei
Jie Nie
197
17
0
12 Dec 2022
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Improving Cross-Modal Retrieval with Set of Diverse EmbeddingsComputer Vision and Pattern Recognition (CVPR), 2022
Dongwon Kim
Nam-Won Kim
Suha Kwak
531
66
0
30 Nov 2022
Dissecting Deep Metric Learning Losses for Image-Text Retrieval
Dissecting Deep Metric Learning Losses for Image-Text RetrievalIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Hong Xuan
Xi Chen
145
2
0
21 Oct 2022
Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval
Cross-modal Semantic Enhanced Interaction for Image-Sentence RetrievalIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Xuri Ge
Fuhai Chen
Songpei Xu
Fuxiang Tao
J. Jose
222
36
0
17 Oct 2022
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual
  Representation Learning
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation LearningNeural Information Processing Systems (NeurIPS), 2022
Fuying Wang
Yuyin Zhou
Shujun Wang
V. Vardhanabhuti
Lequan Yu
318
218
0
12 Oct 2022
Learning to embed semantic similarity for joint image-text retrieval
Learning to embed semantic similarity for joint image-text retrievalIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Noam Malali
Y. Keller
214
12
0
07 Oct 2022
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for
  Image-Text Retrieval
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text RetrievalEuropean Conference on Computer Vision (ECCV), 2022
Haoran Wang
Dongliang He
Wenhao Wu
Boyang Xia
Min Yang
Fu Li
YunLong Yu
Zhong Ji
Errui Ding
Jingdong Wang
199
27
0
21 Aug 2022
Boosting Video-Text Retrieval with Explicit High-Level Semantics
Boosting Video-Text Retrieval with Explicit High-Level SemanticsACM Multimedia (ACM MM), 2022
Haoran Wang
Di Xu
Dongliang He
Fu Li
Zhong Ji
Jungong Han
Errui Ding
227
16
0
08 Aug 2022
Intra-Modal Constraint Loss For Image-Text Retrieval
Intra-Modal Constraint Loss For Image-Text RetrievalInternational Conference on Information Photonics (ICIP), 2022
Jia-nan Chen
Lu Zhang
Qiong Wang
Cong Bai
K. Kpalma
142
7
0
11 Jul 2022
HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text
  Retrieval
HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval
Feilong Chen
Xiuyi Chen
Jiaxin Shi
Duzhen Zhang
Jianlong Chang
Qi Tian
VLMCLIP
227
7
0
24 May 2022
Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote
  Sensing Image Retrieval
Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image RetrievalIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2021
Zhiqiang Yuan
Wenkai Zhang
Kun Fu
Xuan Li
Chubo Deng
Hongqi Wang
Xian Sun
260
186
0
21 Apr 2022
Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and
  Local Information
Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local InformationIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022
Zhiqiang Yuan
Wenkai Zhang
Changyuan Tian
Xuee Rong
Zhengyuan Zhang
Hongqi Wang
Kun Fu
Xian Sun
241
174
0
21 Apr 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey
Visual Attention Methods in Deep Learning: An In-Depth SurveyInformation Fusion (Inf. Fusion), 2022
Mohammed Hassanin
Saeed Anwar
Ibrahim Radwan
Fahad Shahbaz Khan
Lin Wang
346
249
0
16 Apr 2022
ECCV Caption: Correcting False Negatives by Collecting
  Machine-and-Human-verified Image-Caption Associations for MS-COCO
ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCOEuropean Conference on Computer Vision (ECCV), 2022
Sanghyuk Chun
Wonjae Kim
Song Park
Minsuk Chang
Seong Joon Oh
VLM
1.5K
51
0
07 Apr 2022
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Text2Pos: Text-to-Point-Cloud Cross-Modal LocalizationComputer Vision and Pattern Recognition (CVPR), 2022
Manuel Kolmet
Qunjie Zhou
Aljosa Osep
Laura Leal-Taixe
297
40
0
28 Mar 2022
LILE: Look In-Depth before Looking Elsewhere -- A Dual Attention Network
  using Transformers for Cross-Modal Information Retrieval in Histopathology
  Archives
LILE: Look In-Depth before Looking Elsewhere -- A Dual Attention Network using Transformers for Cross-Modal Information Retrieval in Histopathology ArchivesInternational Conference on Medical Imaging with Deep Learning (MIDL), 2022
Danial Maleki
H. R Tizhoosh
MedIm
221
12
0
02 Mar 2022
Auxiliary Cross-Modal Representation Learning with Triplet Loss
  Functions for Online Handwriting Recognition
Auxiliary Cross-Modal Representation Learning with Triplet Loss Functions for Online Handwriting RecognitionIEEE Access (IEEE Access), 2022
Felix Ott
David Rügamer
Lucas Heublein
B. Bischl
Christopher Mutschler
445
12
0
16 Feb 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Multi-Modal Knowledge Graph Construction and Application: A SurveyIEEE Transactions on Knowledge and Data Engineering (TKDE), 2022
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Yixiang Chen
Xuwu Wang
Yanghua Xiao
N. Yuan
211
238
0
11 Feb 2022
Semantic Communications: Principles and Challenges
Semantic Communications: Principles and Challenges
Zhijin Qin
Xiaoming Tao
Jianhua Lu
Wen Tong
Geoffrey Ye Li
507
418
0
30 Dec 2021
Is An Image Worth Five Sentences? A New Look into Semantics for
  Image-Text Matching
Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
459
20
0
06 Oct 2021
Structured Multi-modal Feature Embedding and Alignment for
  Image-Sentence Retrieval
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence RetrievalACM Multimedia (ACM MM), 2021
Xuri Ge
Fuhai Chen
J. Jose
Zhilong Ji
Zhongqin Wu
Xiao-Chang Liu
173
68
0
05 Aug 2021
Semantically Self-Aligned Network for Text-to-Image Part-aware Person
  Re-identification
Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification
Z. Ding
Changxing Ding
Zhiyin Shao
Dacheng Tao
335
226
0
27 Jul 2021
A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval
A Deep Local and Global Scene-Graph Matching for Image-Text RetrievalNew Trends in Software Methodologies, Tools and Techniques (TSMTT), 2021
Manh-Duy Nguyen
Binh T. Nguyen
C. Gurrin
72
20
0
04 Jun 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image
  Retrieval
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image RetrievalInternational Conference on Machine Learning and Applications (ICMLA), 2021
K. Ueki
260
5
0
16 May 2021
Discrete-continuous Action Space Policy Gradient-based Attention for
  Image-Text Matching
Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text MatchingComputer Vision and Pattern Recognition (CVPR), 2021
Shiyang Yan
Li Yu
Yuan Xie
260
35
0
21 Apr 2021
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Cross-Modal Retrieval Augmentation for Multi-Modal ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Shir Gur
Natalia Neverova
C. Stauffer
Ser-Nam Lim
Douwe Kiela
A. Reiter
249
36
0
16 Apr 2021
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval
Rui Zhao
Kecheng Zheng
Zhengjun Zha
Hongtao Xie
Jiebo Luo
140
3
0
29 Mar 2021
An Unsupervised Sampling Approach for Image-Sentence Matching Using
  Document-Level Structural Information
An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural InformationAAAI Conference on Artificial Intelligence (AAAI), 2021
Zejun Li
Zhongyu Wei
Zhihao Fan
Haijun Shan
Xuanjing Huang
173
5
0
21 Mar 2021
12
Next
Page 1 of 2