Latent Diffusion Approaches for Conditional Generation of Aerial Imagery: A Study

Roger Marí; Rafael Redondo

doi:10.5201/ipol.2025.580

Roger Marí, Rafael Redondo

published: 2025-03-11
reference: Roger Marí, and Rafael Redondo, Latent Diffusion Approaches for Conditional Generation of Aerial Imagery: A Study, Image Processing On Line, 15 (2025), pp. 20–31. https://doi.org/10.5201/ipol.2025.580

Communicated by Pablo Musé
Demo edited by Roger Marí

Abstract

Generative artificial intelligence is increasingly being applied in diverse areas such as architecture design, music composition, or character animation. Among the generative methods, diffusion models are today the state of the art in the synthesis of high quality images with inherent diversity and realism. This paper aims to evaluate the fidelity and realism of the synthesis achieved by different architectural variations of a latent diffusion model, which is used to generate aerial images conditioned to semantic maps. As shown in the results, the diffusion model tends to correctly capture the overall semantic structure and generates realistic textures, often with a lack of fine-grained detail. Among the conditioning variations, cross-attention layers were crucial to outline the semantic segments more accurately and exploit conditional data more effectively.

This is an MLBriefs article, the source code has not been reviewed!

Download

full text manuscript: PDF low-res. (612.5kB) PDF (7.8MB) ^[?]
source code: ZIP

Abstract

Download

Preview