ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.13219
  4. Cited By
PiT: Progressive Diffusion Transformer
v1v2v3v4v5 (latest)

PiT: Progressive Diffusion Transformer

19 May 2025
Jiafu Wu
Yabiao Wang
Jian Li
Jinlong Peng
Yun Cao
Chengjie Wang
Jiangning Zhang
ArXiv (abs)PDFHTML

Papers citing "PiT: Progressive Diffusion Transformer"

27 / 27 papers shown
EMOv2: Pushing 5M Vision Model Frontier
EMOv2: Pushing 5M Vision Model Frontier
Jing Zhang
T. Hu
Haoyang He
Zhucun Xue
Yun Wang
Chengjie Wang
Wenshu Fan
Hefei Ling
Dacheng Tao
VLM
255
4
0
09 Dec 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
2.5K
2,797
0
05 Mar 2024
DiffiT: Diffusion Vision Transformers for Image Generation
DiffiT: Diffusion Vision Transformers for Image GenerationEuropean Conference on Computer Vision (ECCV), 2023
Ali Hatamizadeh
Jiaming Song
Guilin Liu
Jan Kautz
Arash Vahdat
360
118
0
04 Dec 2023
PVG: Progressive Vision Graph for Vision Recognition
PVG: Progressive Vision Graph for Vision RecognitionACM Multimedia (ACM MM), 2023
Jiafu Wu
Jian Li
Jiangning Zhang
Boshen Zhang
M. Chi
Yabiao Wang
Chengjie Wang
ViT
323
20
0
01 Aug 2023
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
MDTv2: Masked Diffusion Transformer is a Strong Image SynthesizerIEEE International Conference on Computer Vision (ICCV), 2023
Shanghua Gao
Pan Zhou
Mingg-Ming Cheng
Shuicheng Yan
DiffM
1.1K
254
0
25 Mar 2023
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
One Transformer Fits All Distributions in Multi-Modal Diffusion at ScaleInternational Conference on Machine Learning (ICML), 2023
Fan Bao
Shen Nie
Kaiwen Xue
Chongxuan Li
Shiliang Pu
Yaole Wang
Gang Yue
Yue Cao
Hang Su
Jun Zhu
DiffM
535
177
0
12 Mar 2023
Rethinking Mobile Block for Efficient Attention-based Models
Rethinking Mobile Block for Efficient Attention-based ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Jiangning Zhang
Xiangtai Li
Jian Li
Liang Liu
Zhucun Xue
Boshen Zhang
Zhe Jiang
Tianxin Huang
Yabiao Wang
Chengjie Wang
MQ
347
204
0
03 Jan 2023
All are Worth Words: A ViT Backbone for Diffusion Models
All are Worth Words: A ViT Backbone for Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Fan Bao
Shen Nie
Kaiwen Xue
Yue Cao
Chongxuan Li
Hang Su
Jun Zhu
VLM
553
507
0
25 Sep 2022
Masked-attention Mask Transformer for Universal Image Segmentation
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
1.7K
3,334
0
02 Dec 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data SizesNeural Information Processing Systems (NeurIPS), 2021
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
584
1,478
0
09 Jun 2021
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive BiasNeural Information Processing Systems (NeurIPS), 2021
Yufei Xu
Qiming Zhang
Jing Zhang
Dacheng Tao
ViT
451
396
0
07 Jun 2021
CvT: Introducing Convolutions to Vision Transformers
CvT: Introducing Convolutions to Vision TransformersIEEE International Conference on Computer Vision (ICCV), 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
583
2,291
0
29 Mar 2021
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
  Classification
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image ClassificationIEEE International Conference on Computer Vision (ICCV), 2021
Chun-Fu Chen
Quanfu Fan
Yikang Shen
ViT
362
1,943
0
27 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted WindowsIEEE International Conference on Computer Vision (ICCV), 2021
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
3.2K
28,729
0
25 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without ConvolutionsIEEE International Conference on Computer Vision (ICCV), 2021
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
977
4,374
0
24 Feb 2021
Tokens-to-Token ViT: Training Vision Transformers from Scratch on
  ImageNet
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNetIEEE International Conference on Computer Vision (ICCV), 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
ViT
628
2,360
0
28 Jan 2021
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attentionInternational Conference on Machine Learning (ICML), 2020
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Edouard Grave
ViT
672
8,392
0
23 Dec 2020
Pre-Trained Image Processing Transformer
Pre-Trained Image Processing TransformerComputer Vision and Pattern Recognition (CVPR), 2020
Hanting Chen
Yunhe Wang
Tianyu Guo
Chang Xu
Yiping Deng
Zhenhua Liu
Siwei Ma
Chunjing Xu
Chao Xu
Wen Gao
VLMViT
940
2,064
0
01 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
1.4K
55,775
0
22 Oct 2020
Denoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
5.1K
26,105
0
19 Jun 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEuropean Conference on Computer Vision (ECCV), 2020
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT3DVPINN
2.7K
16,595
0
26 May 2020
GLU Variants Improve Transformer
GLU Variants Improve Transformer
Noam M. Shazeer
614
1,495
0
12 Feb 2020
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be PrunedAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
782
1,345
0
23 May 2019
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
4.4K
163,656
0
12 Jun 2017
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
3.7K
218,756
0
10 Dec 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg3DV
3.5K
89,836
0
18 May 2015
LINE: Large-scale Information Network Embedding
LINE: Large-scale Information Network Embedding
Jian Tang
Meng Qu
Mingzhe Wang
Ming Zhang
Jun Yan
Qiaozhu Mei
GNN
387
5,530
0
12 Mar 2015
1
Page 1 of 1