ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.15110
  4. Cited By
Masked Vision-Language Transformer in Fashion

Masked Vision-Language Transformer in Fashion

27 October 2022
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Christos Sakaridis
Luc Van Gool
ArXivPDFHTML

Papers citing "Masked Vision-Language Transformer in Fashion"

20 / 20 papers shown
Title
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
Bin Li
Dehong Gao
Yeyuan Wang
Linbo Jin
Shanqing Yu
Xiaoyan Cai
Libin Yang
VLM
41
0
0
24 Mar 2025
Text-driven Human Motion Generation with Motion Masked Diffusion Model
Text-driven Human Motion Generation with Motion Masked Diffusion Model
Xingyu Chen
DiffM
VGen
26
1
0
29 Sep 2024
GeoMFormer: A General Architecture for Geometric Molecular
  Representation Learning
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
31
5
0
24 Jun 2024
S-Agents: Self-organizing Agents in Open-ended Environments
S-Agents: Self-organizing Agents in Open-ended Environments
Jia-Qing Chen
Yu-Gang Jiang
Jiachen Lu
Li Zhang
AIFin
LLMAG
LM&Ro
45
15
0
07 Feb 2024
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li
Mingdeng Cao
Xintao Wang
Zhongang Qi
Ming-Ming Cheng
Ying Shan
DiffM
34
187
0
07 Dec 2023
Point Cloud Pre-training with Diffusion Models
Point Cloud Pre-training with Diffusion Models
Xiao Zheng
Xiaoshui Huang
Guofeng Mei
Yuenan Hou
Zhaoyang Lyu
Bo Dai
Wanli Ouyang
Yongshun Gong
15
18
0
25 Nov 2023
TALL: Thumbnail Layout for Deepfake Video Detection
TALL: Thumbnail Layout for Deepfake Video Detection
Yuting Xu
Jian Liang
Gengyun Jia
Ziming Yang
Yanhao Zhang
R. He
ViT
33
51
0
14 Jul 2023
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He
Yihe Deng
Shixiang Tang
Qihao Chen
Qingsong Xie
...
Feng Zhu
Rui Zhao
Wanli Ouyang
Donglian Qi
Yunfeng Yan
65
19
0
13 Jun 2023
Advances in Deep Concealed Scene Understanding
Advances in Deep Concealed Scene Understanding
Deng-Ping Fan
Ge-Peng Ji
Peng-Tao Xu
Ming-Ming Cheng
Christos Sakaridis
Luc Van Gool
25
67
0
21 Apr 2023
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
DiffM
135
155
0
25 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion
  Tasks
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
11
38
0
04 Mar 2023
QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time
  Reasoning
QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning
Weimin Shi
Mingchen Zhuge
D. Gao
Zhong Zhou
Ming-Ming Cheng
Deng-Ping Fan
LRM
VLM
23
0
0
02 Feb 2023
BEVBert: Multimodal Map Pre-training for Language-guided Navigation
BEVBert: Multimodal Map Pre-training for Language-guided Navigation
Dongyan An
Yuankai Qi
Yangguang Li
Yan Huang
Liangsheng Wang
T. Tan
Jing Shao
28
55
0
08 Dec 2022
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Jingfei Xia
Mingchen Zhuge
Tiantian Geng
Shun Fan
Yuantai Wei
Zhenyu He
Feng Zheng
13
13
0
08 Mar 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Paradigm Shift in Natural Language Processing
Paradigm Shift in Natural Language Processing
Tianxiang Sun
Xiangyang Liu
Xipeng Qiu
Xuanjing Huang
114
82
0
26 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Salient Object Detection via Integrity Learning
Salient Object Detection via Integrity Learning
Mingchen Zhuge
Deng-Ping Fan
Nian Liu
Dingwen Zhang
Dong Xu
Ling Shao
AAML
53
289
0
19 Jan 2021
1