Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.15110
Cited By
Masked Vision-Language Transformer in Fashion
27 October 2022
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Christos Sakaridis
Luc Van Gool
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Masked Vision-Language Transformer in Fashion"
20 / 20 papers shown
Title
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
Bin Li
Dehong Gao
Yeyuan Wang
Linbo Jin
Shanqing Yu
Xiaoyan Cai
Libin Yang
VLM
41
0
0
24 Mar 2025
Text-driven Human Motion Generation with Motion Masked Diffusion Model
Xingyu Chen
DiffM
VGen
26
1
0
29 Sep 2024
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning
Tianlang Chen
Shengjie Luo
Di He
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
AI4CE
31
5
0
24 Jun 2024
S-Agents: Self-organizing Agents in Open-ended Environments
Jia-Qing Chen
Yu-Gang Jiang
Jiachen Lu
Li Zhang
AIFin
LLMAG
LM&Ro
45
15
0
07 Feb 2024
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li
Mingdeng Cao
Xintao Wang
Zhongang Qi
Ming-Ming Cheng
Ying Shan
DiffM
34
187
0
07 Dec 2023
Point Cloud Pre-training with Diffusion Models
Xiao Zheng
Xiaoshui Huang
Guofeng Mei
Yuenan Hou
Zhaoyang Lyu
Bo Dai
Wanli Ouyang
Yongshun Gong
15
18
0
25 Nov 2023
TALL: Thumbnail Layout for Deepfake Video Detection
Yuting Xu
Jian Liang
Gengyun Jia
Ziming Yang
Yanhao Zhang
R. He
ViT
33
51
0
14 Jul 2023
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He
Yihe Deng
Shixiang Tang
Qihao Chen
Qingsong Xie
...
Feng Zhu
Rui Zhao
Wanli Ouyang
Donglian Qi
Yunfeng Yan
65
19
0
13 Jun 2023
Advances in Deep Concealed Scene Understanding
Deng-Ping Fan
Ge-Peng Ji
Peng-Tao Xu
Ming-Ming Cheng
Christos Sakaridis
Luc Van Gool
25
67
0
21 Apr 2023
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
DiffM
135
155
0
25 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
11
38
0
04 Mar 2023
QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning
Weimin Shi
Mingchen Zhuge
D. Gao
Zhong Zhou
Ming-Ming Cheng
Deng-Ping Fan
LRM
VLM
23
0
0
02 Feb 2023
BEVBert: Multimodal Map Pre-training for Language-guided Navigation
Dongyan An
Yuankai Qi
Yangguang Li
Yan Huang
Liangsheng Wang
T. Tan
Jing Shao
28
55
0
08 Dec 2022
Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
Jingfei Xia
Mingchen Zhuge
Tiantian Geng
Shun Fan
Yuantai Wei
Zhenyu He
Feng Zheng
13
13
0
08 Mar 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Paradigm Shift in Natural Language Processing
Tianxiang Sun
Xiangyang Liu
Xipeng Qiu
Xuanjing Huang
114
82
0
26 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Salient Object Detection via Integrity Learning
Mingchen Zhuge
Deng-Ping Fan
Nian Liu
Dingwen Zhang
Dong Xu
Ling Shao
AAML
53
289
0
19 Jan 2021
1