Latent Diffusion Approaches for Conditional Generation of Aerial Imagery: A Study
Roger Marí, Rafael Redondo
published
2025-03-11
reference
Roger Marí, and Rafael Redondo, Latent Diffusion Approaches for Conditional Generation of Aerial Imagery: A Study, Image Processing On Line, 15 (2025), pp. 20–31. https://doi.org/10.5201/ipol.2025.580

Communicated by Pablo Musé
Demo edited by Roger Marí

Abstract

Generative artificial intelligence is increasingly being applied in diverse areas such as architecture design, music composition, or character animation. Among the generative methods, diffusion models are today the state of the art in the synthesis of high quality images with inherent diversity and realism. This paper aims to evaluate the fidelity and realism of the synthesis achieved by different architectural variations of a latent diffusion model, which is used to generate aerial images conditioned to semantic maps. As shown in the results, the diffusion model tends to correctly capture the overall semantic structure and generates realistic textures, often with a lack of fine-grained detail. Among the conditioning variations, cross-attention layers were crucial to outline the semantic segments more accurately and exploit conditional data more effectively.

This is an MLBriefs article, the source code has not been reviewed!

Download