Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1808.07793
Cited By
Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
23 August 2018
Niluthpol Chowdhury Mithun
Yikang Shen
Evangelos E. Papalexakis
Amit K. Roy-Chowdhury
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval"
25 / 25 papers shown
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
288
0
0
26 Mar 2024
Open-Vocabulary Camouflaged Object Segmentation
Youwei Pang
Xiaoqi Zhao
Jiaming Zuo
Lihe Zhang
Huchuan Lu
VLM
ObjD
330
13
0
19 Nov 2023
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
333
45
0
21 Jul 2023
Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Md Zahid Hasan
Jiajing Chen
Jiyang Wang
Mohammed Shaiqur Rahman
Ameya Joshi
Senem Velipasalar
Chinmay Hegde
Anuj Sharma
Soumik Sarkar
VLM
351
40
0
16 Jun 2023
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Computer Vision and Pattern Recognition (CVPR), 2022
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
230
43
0
22 Mar 2022
Cross Modal Retrieval with Querybank Normalisation
Computer Vision and Pattern Recognition (CVPR), 2021
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
290
115
0
23 Dec 2021
Exploiting Cross-Modal Prediction and Relation Consistency for Semi-Supervised Image Captioning
IEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2021
Yang Yang
Haoran Wei
Hengshu Zhu
Dianhai Yu
Hui Xiong
Jian Yang
SSL
100
42
0
22 Oct 2021
Multimodal Entity Linking for Tweets
European Conference on Information Retrieval (ECIR), 2020
Omar Adjali
Romaric Besançon
Olivier Ferret
Hervé Le Borgne
Brigitte Grau
161
56
0
07 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
International Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
2.0K
41,259
0
26 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Transactions of the Association for Computational Linguistics (TACL), 2021
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
234
126
0
31 Jan 2021
RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization
ACM Multimedia (ACM MM), 2020
Niluthpol Chowdhury Mithun
Karan Sikka
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
211
19
0
12 Sep 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Gaowen Liu
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
245
51
0
29 Jul 2020
COBE: Contextualized Object Embeddings from Narrated Instructional Video
Neural Information Processing Systems (NeurIPS), 2020
Gedas Bertasius
Lorenzo Torresani
187
27
0
14 Jul 2020
A Feature Analysis for Multimodal News Retrieval
Golsa Tahmasebzadeh
Sherzod Hakimov
Eric Müller-Budack
Ralph Ewerth
167
2
0
13 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
423
400
0
29 Jun 2020
Mitigating Gender Bias in Captioning Systems
Ruixiang Tang
Mengnan Du
Yuening Li
Zirui Liu
Na Zou
Helen Zhou
FaML
538
74
0
15 Jun 2020
COBRA: Contrastive Bi-Modal Representation Algorithm
Vishaal Udandarao
A. Maiti
Deepak Srivatsav
Suryatej Reddy Vyalla
Yifang Yin
R. Shah
221
28
0
07 May 2020
Graph Structured Network for Image-Text Matching
Computer Vision and Pattern Recognition (CVPR), 2020
Chunxiao Liu
Zhendong Mao
Tianzhu Zhang
Hongtao Xie
Bin Wang
Yongdong Zhang
188
277
0
01 Apr 2020
Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder Framework
IEEE transactions on multimedia (TMM), 2020
Yaochen Zhu
Jiayi Xie
Zhenzhong Chen
97
33
0
28 Mar 2020
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval
Computer Vision and Pattern Recognition (CVPR), 2020
Hui Chen
Guiguang Ding
Xudong Liu
Zijia Lin
Ji Liu
Jungong Han
193
365
0
08 Mar 2020
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2019
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGen
SSL
608
754
0
13 Dec 2019
Prediction and Description of Near-Future Activities in Video
Computer Vision and Image Understanding (CVIU), 2019
T. Mahmud
Mohammad Billah
Mahmudul Hasan
Amit K. Roy-Chowdhury
379
17
0
02 Aug 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
IEEE International Conference on Computer Vision (ICCV), 2019
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
512
1,366
0
07 Jun 2019
Multitask Text-to-Visual Embedding with Titles and Clickthrough Data
Pranav Aggarwal
Zhe Lin
Baldo Faieta
Saeid Motiian
48
6
0
30 May 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
284
211
0
05 Apr 2019
1