301
v1v2 (latest)

PBR-Inspired Controllable Diffusion for Image Generation

Main:10 Pages
12 Figures
Bibliography:3 Pages
3 Tables
Abstract

Despite recent advances in text-to-image generation, controlling geometric layout and PBR material properties in synthesized scenes remains challenging. We present a pipeline that first produces a G-buffer (albedo, normals, depth, roughness, shading, and metallic) from a text prompt and then renders a final image through a PBR-inspired branch network. This intermediate representation enables fine-grained control: users can copy and paste within specific G-buffer channels to insert or reposition objects, or apply masks to the irradiance channel to adjust lighting locally. As a result, real objects can be seamlessly integrated into virtual scenes. By separating user-friendly scene description from image rendering, our method offers a practical balance between detailed post-generation control and efficient text-driven synthesis. We demonstrate its effectiveness through quantitative evaluations and a user study with 156 participants, showing consistent human preference over strong baselines and confirming that G-buffer control extends the flexibility of text-guided image generation.

View on arXiv
Comments on this paper