ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.04344
45
0

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

6 March 2025
Shen Zhang
Yaning Tan
Siyuan Liang
Zhaowei Chen
Linze Li
Ge Wu
Yuhao Chen
Shuheng Li
Zhenyu Zhao
Caihua Chen
Jiajun Liang
Yao Tang
ArXivPDFHTML
Abstract

Diffusion transformers(DiTs) struggle to generate images at resolutions higher than their training resolutions. The primary obstacle is that the explicit positional encodings(PE), such as RoPE, need extrapolation which degrades performance when the inference resolution differs from training. In this paper, we propose a Length-Extrapolatable Diffusion Transformer(LEDiT), a simple yet powerful architecture to overcome this limitation. LEDiT needs no explicit PEs, thereby avoiding extrapolation. The key innovations of LEDiT are introducing causal attention to implicitly impart global positional information to tokens, while enhancing locality to precisely distinguish adjacent tokens. Experiments on 256x256 and 512x512 ImageNet show that LEDiT can scale the inference resolution to 512x512 and 1024x1024, respectively, while achieving better image quality compared to current state-of-the-art length extrapolation methods(NTK-aware, YaRN). Moreover, LEDiT achieves strong extrapolation performance with just 100K steps of fine-tuning on a pretrained DiT, demonstrating its potential for integration into existing text-to-image DiTs. Project page:this https URL

View on arXiv
@article{zhang2025_2503.04344,
  title={ LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding },
  author={ Shen Zhang and Yaning Tan and Siyuan Liang and Zhaowei Chen and Linze Li and Ge Wu and Yuhao Chen and Shuheng Li and Zhenyu Zhao and Caihua Chen and Jiajun Liang and Yao Tang },
  journal={arXiv preprint arXiv:2503.04344},
  year={ 2025 }
}
Comments on this paper