ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.16110
  4. Cited By
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

30 March 2021
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
    VLM
ArXivPDFHTML

Papers citing "Kaleido-BERT: Vision-Language Pre-training on Fashion Domain"

18 / 18 papers shown
Title
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon
Federico Girella
Ziyue Liu
Marco Cristani
Yiming Wang
VLM
42
0
0
06 May 2025
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
62
11
0
05 Mar 2024
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A
  Reproducibility Study
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study
Mariya Hendriksen
Svitlana Vakulenko
E. Kuiper
Maarten de Rijke
19
5
0
12 Jan 2023
Masked Vision-Language Transformer in Fashion
Masked Vision-Language Transformer in Fashion
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Christos Sakaridis
Luc Van Gool
14
25
0
27 Oct 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
518
0
13 Jun 2022
Training and challenging models for text-guided fashion image retrieval
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
14
8
0
23 Apr 2022
Vision-and-Language Pretrained Models: A Survey
Vision-and-Language Pretrained Models: A Survey
Siqu Long
Feiqi Cao
S. Han
Haiqing Yang
VLM
14
63
0
15 Apr 2022
UIGR: Unified Interactive Garment Retrieval
UIGR: Unified Interactive Garment Retrieval
Xiaoping Han
Sen He
Li Zhang
Yi-Zhe Song
Tao Xiang
16
7
0
06 Apr 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
20
16
0
27 Mar 2022
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Jingfei Xia
Mingchen Zhuge
Tiantian Geng
Shun Fan
Yuantai Wei
Zhenyu He
Feng Zheng
10
13
0
08 Mar 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
79
208
0
18 Feb 2022
TransFusion: Cross-view Fusion with Transformer for 3D Human Pose
  Estimation
TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation
Haoyu Ma
Liangjian Chen
Deying Kong
Zhe Wang
Xingwei Liu
Hao Tang
Xiangyi Yan
Yusheng Xie
Shi-yao Lin
Xiaohui Xie
ViT
19
61
0
18 Oct 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial
  Multi-modal Pretraining
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
19
36
0
09 Sep 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Salient Object Detection via Integrity Learning
Salient Object Detection via Integrity Learning
Mingchen Zhuge
Deng-Ping Fan
Nian Liu
Dingwen Zhang
Dong Xu
Ling Shao
AAML
53
289
0
19 Jan 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
922
0
24 Sep 2019
Structure-measure: A New Way to Evaluate Foreground Maps
Structure-measure: A New Way to Evaluate Foreground Maps
Deng-Ping Fan
Ming-Ming Cheng
Yun-Hai Liu
Tao Li
Ali Borji
63
1,320
0
02 Aug 2017
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,724
0
26 Sep 2016
1