Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.00522
Cited By
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
1 March 2024
Xiangxiang Chu
Jianlin Su
Bo-Wen Zhang
Chunhua Shen
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks"
7 / 7 papers shown
Title
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
Rui Lan
Y. Bai
Xu Duan
M. Li
Lei Sun
X. Chu
DiffM
47
0
0
06 May 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
60
0
0
08 Mar 2025
FiT: Flexible Vision Transformer for Diffusion Model
Zeyu Lu
Zidong Wang
Di Huang
Chengyue Wu
Xihui Liu
Wanli Ouyang
Lei Bai
155
45
0
19 Feb 2024
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
M. S. Albergo
Nicholas M. Boffi
Eric Vanden-Eijnden
DiffM
244
260
0
15 Mar 2023
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
229
74,467
0
18 May 2015
1