Acta Geodaetica et Cartographica Sinica ›› 2026, Vol. 55 ›› Issue (5): 894-908.doi: 10.11947/j.AGCS.2026.20250529

• Photogrammetry and Remote Sensing • Previous Articles     Next Articles

Controllable generation of high-resolution optical remote sensing image explicitly guided by spatio-temporal information

Tiandong SHI1(), Ling ZHAO1, Wenhao ZHAO2, Ji QI3(), Hao CUI4, Chengli PENG1, Xinchang ZHANG3   

  1. 1.School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
    2.National Geomatics Center of China, Beijing 100830, China
    3.School of Geography and Remote Sensing, Guangzhou University, Guangzhou 510006, China
    4.State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, China
  • Received:2025-12-19 Revised:2026-04-19 Online:2026-06-23 Published:2026-06-23
  • Contact: Ji QI E-mail:csushitd@csu.edu.cn;jameschi95@foxmail.com
  • About author:SHI Tiandong (1996—), male, PhD candidate, majors in remote sensing intelligent interpretation and controllable generation. E-mail: csushitd@csu.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42571533; 42371406);Funded by State Key Laboratory of Spatial Datum(SKLSD2026-KF-27)

Abstract:

The visual appearance of land cover objects in high-resolution optical remote sensing images is significantly influenced by seasonal evolution and regional differences. Enhancing the spatio-temporal controllability of generation models to reproduce object features under specific spatio-temporal contexts accurately remains a critical challenge. Existing research has limitations in the encoding method of multi-source spatio-temporal information, as well as the interaction guidance method between encoded spatio-temporal features and visual features, making it difficult to accurately model the precise mapping between spatio-temporal conditions and the visual appearance of land cover objects. To address this problem, this paper proposes a framework for spatio-temporal controllable high-resolution optical remote sensing image generation. First, a multi-source spatio-temporal information encoding strategy considering attribute differences is designed, which utilizes heterogeneous frequency encoding and independent projections to transform diverse spatio-temporal information into accurate and decoupled feature representations, thereby modeling the unique properties of diverse spatio-temporal information. Second, an interaction guidance mechanism between spatio-temporal features and visual features based on decoupled attention is designed. This mechanism employs an independent parallel attention branch to facilitate deep interaction between spatio-temporal features and visual features, effectively leveraging the constraining role of spatio-temporal information without interfering with text-guided generation. We adopt low-rank adaptation to efficiently transfer domain knowledge by optimizing only low-rank decomposition matrices, thereby preserving the pre-trained generative priors of the base model. Experiments on a large-scale dataset covering seven typical regions in China demonstrate that the proposed method outperforms state-of-the-art methods by 46.69% and 14.67% in terms of spatio-temporal distribution consistency and structural-textural consistency, respectively. These results confirm the controllability and generalization potential of the proposed framework across diverse spatio-temporal scenarios.

Key words: spatio-temporal intelligence, remote sensing image, generation model, diffusion model, deep learning, spatio-temporal information

CLC Number: