ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.04718
21
0

Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers

7 May 2025
Divyansh Srivastava
Xiang Zhang
He Wen
Chenru Wen
Zhuowen Tu
    DiffM
ArXivPDFHTML
Abstract

We present Lay-Your-Scene (shorthand LayouSyn), a novel text-to-layout generation pipeline for natural scenes. Prior scene layout generation methods are either closed-vocabulary or use proprietary large language models for open-vocabulary generation, limiting their modeling capabilities and broader applicability in controllable image generation. In this work, we propose to use lightweight open-source language models to obtain scene elements from text prompts and a novel aspect-aware diffusion Transformer architecture trained in an open-vocabulary manner for conditional layout generation. Extensive experiments demonstrate that LayouSyn outperforms existing methods and achieves state-of-the-art performance on challenging spatial and numerical reasoning benchmarks. Additionally, we present two applications of LayouSyn. First, we show that coarse initialization from large language models can be seamlessly combined with our method to achieve better results. Second, we present a pipeline for adding objects to images, demonstrating the potential of LayouSyn in image editing applications.

View on arXiv
@article{srivastava2025_2505.04718,
  title={ Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers },
  author={ Divyansh Srivastava and Xiang Zhang and He Wen and Chenru Wen and Zhuowen Tu },
  journal={arXiv preprint arXiv:2505.04718},
  year={ 2025 }
}
Comments on this paper