ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.07793
  4. Cited By
Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

23 August 2018
Niluthpol Chowdhury Mithun
Yikang Shen
Evangelos E. Papalexakis
Amit K. Roy-Chowdhury
ArXiv (abs)PDFHTML

Papers citing "Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval"

25 / 25 papers shown
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
288
0
0
26 Mar 2024
Open-Vocabulary Camouflaged Object Segmentation
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang
Xiaoqi Zhao
Jiaming Zuo
Lihe Zhang
Huchuan Lu
VLMObjD
330
13
0
19 Nov 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future ChallengesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
333
45
0
21 Jul 2023
Vision-Language Models can Identify Distracted Driver Behavior from
  Naturalistic Videos
Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Md Zahid Hasan
Jiajing Chen
Jiyang Wang
Mohammed Shaiqur Rahman
Ameya Joshi
Senem Velipasalar
Chinmay Hegde
Anuj Sharma
Soumik Sarkar
VLM
351
40
0
16 Jun 2023
Look for the Change: Learning Object States and State-Modifying Actions
  from Untrimmed Web Videos
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web VideosComputer Vision and Pattern Recognition (CVPR), 2022
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
230
43
0
22 Mar 2022
Cross Modal Retrieval with Querybank Normalisation
Cross Modal Retrieval with Querybank NormalisationComputer Vision and Pattern Recognition (CVPR), 2021
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
290
115
0
23 Dec 2021
Exploiting Cross-Modal Prediction and Relation Consistency for
  Semi-Supervised Image Captioning
Exploiting Cross-Modal Prediction and Relation Consistency for Semi-Supervised Image CaptioningIEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2021
Yang Yang
Haoran Wei
Hengshu Zhu
Dianhai Yu
Hui Xiong
Jian Yang
SSL
100
42
0
22 Oct 2021
Multimodal Entity Linking for Tweets
Multimodal Entity Linking for TweetsEuropean Conference on Information Retrieval (ECIR), 2020
Omar Adjali
Romaric Besançon
Olivier Ferret
Hervé Le Borgne
Brigitte Grau
161
56
0
07 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language SupervisionInternational Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
2.0K
41,259
0
26 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal TransformersTransactions of the Association for Computational Linguistics (TACL), 2021
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
234
126
0
31 Jan 2021
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual LocalizationACM Multimedia (ACM MM), 2020
Niluthpol Chowdhury Mithun
Karan Sikka
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
211
19
0
12 Sep 2020
Learning Video Representations from Textual Web Supervision
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Gaowen Liu
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
245
51
0
29 Jul 2020
COBE: Contextualized Object Embeddings from Narrated Instructional Video
COBE: Contextualized Object Embeddings from Narrated Instructional VideoNeural Information Processing Systems (NeurIPS), 2020
Gedas Bertasius
Lorenzo Torresani
187
27
0
14 Jul 2020
A Feature Analysis for Multimodal News Retrieval
A Feature Analysis for Multimodal News Retrieval
Golsa Tahmasebzadeh
Sherzod Hakimov
Eric Müller-Budack
Ralph Ewerth
167
2
0
13 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
423
400
0
29 Jun 2020
Mitigating Gender Bias in Captioning Systems
Mitigating Gender Bias in Captioning Systems
Ruixiang Tang
Mengnan Du
Yuening Li
Zirui Liu
Na Zou
Helen Zhou
FaML
538
74
0
15 Jun 2020
COBRA: Contrastive Bi-Modal Representation Algorithm
COBRA: Contrastive Bi-Modal Representation Algorithm
Vishaal Udandarao
A. Maiti
Deepak Srivatsav
Suryatej Reddy Vyalla
Yifang Yin
R. Shah
221
28
0
07 May 2020
Graph Structured Network for Image-Text Matching
Graph Structured Network for Image-Text MatchingComputer Vision and Pattern Recognition (CVPR), 2020
Chunxiao Liu
Zhendong Mao
Tianzhu Zhang
Hongtao Xie
Bin Wang
Yongdong Zhang
188
277
0
01 Apr 2020
Predicting the Popularity of Micro-videos with Multimodal Variational
  Encoder-Decoder Framework
Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder FrameworkIEEE transactions on multimedia (TMM), 2020
Yaochen Zhu
Jiayi Xie
Zhenzhong Chen
97
33
0
28 Mar 2020
IMRAM: Iterative Matching with Recurrent Attention Memory for
  Cross-Modal Image-Text Retrieval
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text RetrievalComputer Vision and Pattern Recognition (CVPR), 2020
Hui Chen
Guiguang Ding
Xudong Liu
Zijia Lin
Ji Liu
Jungong Han
193
365
0
08 Mar 2020
End-to-End Learning of Visual Representations from Uncurated
  Instructional Videos
End-to-End Learning of Visual Representations from Uncurated Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2019
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGenSSL
608
754
0
13 Dec 2019
Prediction and Description of Near-Future Activities in Video
Prediction and Description of Near-Future Activities in VideoComputer Vision and Image Understanding (CVIU), 2019
T. Mahmud
Mohammad Billah
Mahmudul Hasan
Amit K. Roy-Chowdhury
379
17
0
02 Aug 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video ClipsIEEE International Conference on Computer Vision (ICCV), 2019
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
512
1,366
0
07 Jun 2019
Multitask Text-to-Visual Embedding with Titles and Clickthrough Data
Multitask Text-to-Visual Embedding with Titles and Clickthrough Data
Pranav Aggarwal
Zhe Lin
Baldo Faieta
Saeid Motiian
48
6
0
30 May 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
284
211
0
05 Apr 2019
1