测绘学报 ›› 2026, Vol. 55 ›› Issue (5): 894-908.doi: 10.11947/j.AGCS.2026.20250529

• 摄影测量学与遥感 • 上一篇    下一篇

时空信息显式引导的高分光学遥感影像可控生成方法

时天东1(), 赵玲1, 赵文豪2, 齐霁3(), 崔浩4, 彭程里1, 张新长3   

  1. 1.中南大学地球科学与信息物理学院,湖南 长沙 410083
    2.国家基础地理信息中心,北京 100830
    3.广州大学地理科学与遥感学院,广东 广州 510006
    4.武汉大学测绘遥感信息工程全国重点实验室,湖北 武汉 430079
  • 收稿日期:2025-12-19 修回日期:2026-04-19 出版日期:2026-06-23 发布日期:2026-06-23
  • 通讯作者: 齐霁 E-mail:csushitd@csu.edu.cn;jameschi95@foxmail.com
  • 作者简介:时天东(1996—),男,博士生,研究方向为遥感影像智能解译与可控生成。 E-mail:csushitd@csu.edu.cn
  • 基金资助:
    国家自然科学基金(42571533; 42371406);空间基准全国重点实验室开放基金(SKLSD2026-KF-27)

Controllable generation of high-resolution optical remote sensing image explicitly guided by spatio-temporal information

Tiandong SHI1(), Ling ZHAO1, Wenhao ZHAO2, Ji QI3(), Hao CUI4, Chengli PENG1, Xinchang ZHANG3   

  1. 1.School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
    2.National Geomatics Center of China, Beijing 100830, China
    3.School of Geography and Remote Sensing, Guangzhou University, Guangzhou 510006, China
    4.State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, China
  • Received:2025-12-19 Revised:2026-04-19 Online:2026-06-23 Published:2026-06-23
  • Contact: Ji QI E-mail:csushitd@csu.edu.cn;jameschi95@foxmail.com
  • About author:SHI Tiandong (1996—), male, PhD candidate, majors in remote sensing intelligent interpretation and controllable generation. E-mail: csushitd@csu.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(42571533; 42371406);Funded by State Key Laboratory of Spatial Datum(SKLSD2026-KF-27)

摘要:

遥感地物的视觉表现受季节演变与地域差异影响显著,如何增强生成模型的时空可控性,精准复现特定时空背景下的地物特征,是当前高分光学遥感影像生成领域面临的关键挑战。现有研究在多源时空信息的编码方式,以及时空特征与视觉特征交互引导方式这两个方面存在局限性,难以准确建模时空条件与地物视觉形态的精确映射关系。对此,本文提出了一种面向时空可控的高分光学遥感影像生成方法。首先,设计了顾及属性差异的多源时空信息编码方式,利用异构频率编码与独立投影将不同属性的时空信息转化为解耦的高维特征表示,以建模多源时空信息的独特属性。其次,设计基于解耦注意力的时空-文本联合交互引导机制,采用独立并行注意力分支促进时空特征与视觉特征的交互引导,在不干扰文本引导生成的同时,充分发挥时空信息在影像生成过程中的约束作用。此外,利用低秩适配训练策略,仅微调少量参数即实现了领域知识的高效迁移与基础生成能力的完整保留。在覆盖我国7个典型区域的大规模数据集上的试验表明,本文方法在时空分布一致性和结构纹理一致性上较现有先进方法分别提升了46.69%和14.67%,证实了该框架在多样时空场景下的生成可控性与泛化潜力。

关键词: 时空智能, 遥感影像, 生成模型, 扩散模型, 深度学习, 时空信息

Abstract:

The visual appearance of land cover objects in high-resolution optical remote sensing images is significantly influenced by seasonal evolution and regional differences. Enhancing the spatio-temporal controllability of generation models to reproduce object features under specific spatio-temporal contexts accurately remains a critical challenge. Existing research has limitations in the encoding method of multi-source spatio-temporal information, as well as the interaction guidance method between encoded spatio-temporal features and visual features, making it difficult to accurately model the precise mapping between spatio-temporal conditions and the visual appearance of land cover objects. To address this problem, this paper proposes a framework for spatio-temporal controllable high-resolution optical remote sensing image generation. First, a multi-source spatio-temporal information encoding strategy considering attribute differences is designed, which utilizes heterogeneous frequency encoding and independent projections to transform diverse spatio-temporal information into accurate and decoupled feature representations, thereby modeling the unique properties of diverse spatio-temporal information. Second, an interaction guidance mechanism between spatio-temporal features and visual features based on decoupled attention is designed. This mechanism employs an independent parallel attention branch to facilitate deep interaction between spatio-temporal features and visual features, effectively leveraging the constraining role of spatio-temporal information without interfering with text-guided generation. We adopt low-rank adaptation to efficiently transfer domain knowledge by optimizing only low-rank decomposition matrices, thereby preserving the pre-trained generative priors of the base model. Experiments on a large-scale dataset covering seven typical regions in China demonstrate that the proposed method outperforms state-of-the-art methods by 46.69% and 14.67% in terms of spatio-temporal distribution consistency and structural-textural consistency, respectively. These results confirm the controllability and generalization potential of the proposed framework across diverse spatio-temporal scenarios.

Key words: spatio-temporal intelligence, remote sensing image, generation model, diffusion model, deep learning, spatio-temporal information

中图分类号: