时空信息显式引导的高分光学遥感影像可控生成方法

doi:10.11947/j.AGCS.2026.20250529

测绘学报 ›› 2026, Vol. 55 ›› Issue (5): 894-908.doi: 10.11947/j.AGCS.2026.20250529

时空信息显式引导的高分光学遥感影像可控生成方法

时天东¹(), 赵玲¹, 赵文豪², 齐霁³(), 崔浩⁴, 彭程里¹, 张新长³

^1.中南大学地球科学与信息物理学院，湖南　长沙　410083
^2.国家基础地理信息中心，北京　100830
^3.广州大学地理科学与遥感学院，广东　广州　510006
^4.武汉大学测绘遥感信息工程全国重点实验室，湖北　武汉　430079

收稿日期:2025-12-19 修回日期:2026-04-19 出版日期:2026-06-23 发布日期:2026-06-23
通讯作者: 齐霁 E-mail:csushitd@csu.edu.cn;jameschi95@foxmail.com
作者简介:时天东（1996—），男，博士生，研究方向为遥感影像智能解译与可控生成。　E-mail：csushitd@csu.edu.cn
基金资助:
国家自然科学基金(42571533; 42371406);空间基准全国重点实验室开放基金(SKLSD2026-KF-27)

Controllable generation of high-resolution optical remote sensing image explicitly guided by spatio-temporal information

Tiandong SHI¹(), Ling ZHAO¹, Wenhao ZHAO², Ji QI³(), Hao CUI⁴, Chengli PENG¹, Xinchang ZHANG³

^1.School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
^2.National Geomatics Center of China, Beijing 100830, China
^3.School of Geography and Remote Sensing, Guangzhou University, Guangzhou 510006, China
^4.State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, China

Received:2025-12-19 Revised:2026-04-19 Online:2026-06-23 Published:2026-06-23
Contact: Ji QI E-mail:csushitd@csu.edu.cn;jameschi95@foxmail.com
About author:SHI Tiandong (1996—), male, PhD candidate, majors in remote sensing intelligent interpretation and controllable generation.　E-mail: csushitd@csu.edu.cn
Supported by:
The National Natural Science Foundation of China(42571533; 42371406);Funded by State Key Laboratory of Spatial Datum(SKLSD2026-KF-27)

摘要/Abstract

摘要：

遥感地物的视觉表现受季节演变与地域差异影响显著，如何增强生成模型的时空可控性，精准复现特定时空背景下的地物特征，是当前高分光学遥感影像生成领域面临的关键挑战。现有研究在多源时空信息的编码方式，以及时空特征与视觉特征交互引导方式这两个方面存在局限性，难以准确建模时空条件与地物视觉形态的精确映射关系。对此，本文提出了一种面向时空可控的高分光学遥感影像生成方法。首先，设计了顾及属性差异的多源时空信息编码方式，利用异构频率编码与独立投影将不同属性的时空信息转化为解耦的高维特征表示，以建模多源时空信息的独特属性。其次，设计基于解耦注意力的时空-文本联合交互引导机制，采用独立并行注意力分支促进时空特征与视觉特征的交互引导，在不干扰文本引导生成的同时，充分发挥时空信息在影像生成过程中的约束作用。此外，利用低秩适配训练策略，仅微调少量参数即实现了领域知识的高效迁移与基础生成能力的完整保留。在覆盖我国7个典型区域的大规模数据集上的试验表明，本文方法在时空分布一致性和结构纹理一致性上较现有先进方法分别提升了46.69%和14.67%，证实了该框架在多样时空场景下的生成可控性与泛化潜力。

关键词: 时空智能, 遥感影像, 生成模型, 扩散模型, 深度学习, 时空信息

Abstract:

The visual appearance of land cover objects in high-resolution optical remote sensing images is significantly influenced by seasonal evolution and regional differences. Enhancing the spatio-temporal controllability of generation models to reproduce object features under specific spatio-temporal contexts accurately remains a critical challenge. Existing research has limitations in the encoding method of multi-source spatio-temporal information, as well as the interaction guidance method between encoded spatio-temporal features and visual features, making it difficult to accurately model the precise mapping between spatio-temporal conditions and the visual appearance of land cover objects. To address this problem, this paper proposes a framework for spatio-temporal controllable high-resolution optical remote sensing image generation. First, a multi-source spatio-temporal information encoding strategy considering attribute differences is designed, which utilizes heterogeneous frequency encoding and independent projections to transform diverse spatio-temporal information into accurate and decoupled feature representations, thereby modeling the unique properties of diverse spatio-temporal information. Second, an interaction guidance mechanism between spatio-temporal features and visual features based on decoupled attention is designed. This mechanism employs an independent parallel attention branch to facilitate deep interaction between spatio-temporal features and visual features, effectively leveraging the constraining role of spatio-temporal information without interfering with text-guided generation. We adopt low-rank adaptation to efficiently transfer domain knowledge by optimizing only low-rank decomposition matrices, thereby preserving the pre-trained generative priors of the base model. Experiments on a large-scale dataset covering seven typical regions in China demonstrate that the proposed method outperforms state-of-the-art methods by 46.69% and 14.67% in terms of spatio-temporal distribution consistency and structural-textural consistency, respectively. These results confirm the controllability and generalization potential of the proposed framework across diverse spatio-temporal scenarios.

Key words: spatio-temporal intelligence, remote sensing image, generation model, diffusion model, deep learning, spatio-temporal information

中图分类号:

P237

时天东, 赵玲, 赵文豪, 齐霁, 崔浩, 彭程里, 张新长. 时空信息显式引导的高分光学遥感影像可控生成方法[J]. 测绘学报, 2026, 55(5): 894-908.

Tiandong SHI, Ling ZHAO, Wenhao ZHAO, Ji QI, Hao CUI, Chengli PENG, Xinchang ZHANG. Controllable generation of high-resolution optical remote sensing image explicitly guided by spatio-temporal information[J]. Acta Geodaetica et Cartographica Sinica, 2026, 55(5): 894-908.

导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks

链接本文: http://xb.chinasmp.com/CN/10.11947/j.AGCS.2026.20250529

http://xb.chinasmp.com/CN/Y2026/V55/I5/894

图/表 19

图1

图2

图3

图4

图5

图6

表1

图7

图8

表2

图9

表3

图10

图11

表4

图12

表5

图13

图14

参考文献 35

[1]	HUANG Wei, CUI Zhimei, HUANG Zhidu, et al. Research on building extraction based on object-oriented CART classification algorithm and GF-2 Satellite images[J]. Journal of Geodesy and Geoinformation Science, 2024, 7(4): 5-18.
[2]	ZHAO Bofei, SUI Haigang, ZHU Yihao, et al. Real-time rescue target detection based on UAV imagery for flood emergency response[J]. Journal of Geodesy and Geoinformation Science, 2024, 7(1): 74-89.
[3]	HAN Zheng, LING Ziyan, DONG Li, et al. Heterogeneity effect of human disturbances on landscape patterns in the Yellow River Delta wetland, China[J]. Journal of Geodesy and Geoinformation Science, 2024, 7(4): 75-93.
[4]	杨元喜. 地理空间数字孪生与时空智能[J]. 测绘学报, 2025, 54(2): 213-220. DOI: . doi: 10.11947/j.AGCS.2025.20240515
	YANG Yuanxi. Digital twin and spatio-temporal intelligence of geospatial information system[J]. Acta Geodaetica et Cartographica Sinica, 2025, 54(2): 213-220. DOI: . doi: 10.11947/j.AGCS.2025.20240515
[5]	李德仁, 王密, 肖晶, 等. 论无所不在的时空智能[J]. 遥感学报, 2025, 29(6): 1388-1398.
	LI Deren, WANG Mi, XIAO Jing, et al. On ubiquitous spatio-temporal intelligence[J]. National Remote Sensing Bulletin, 2025, 29(6): 1388-1398.
[6]	陈军, 艾廷华, 闫利, 等. 智能化测绘的混合计算范式与方法研究[J]. 测绘学报, 2024, 53(6): 985-998. DOI: . doi: 10.11947/j.AGCS.2024.20240131
	CHEN Jun, AI Tinghua, YAN Li, et al. Hybrid computational paradigm and methods for intelligentized surveying and mapping[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(6): 985-998. DOI: . doi: 10.11947/j.AGCS.2024.20240131
[7]	TAO Chao, QI Ji, ZHANG Guo, et al. TOV: the original vision model for optical remote sensing image understanding via self-supervised learning[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16: 4916-4930.
[8]	龚健雅, 许越, 胡翔云, 等. 遥感影像智能解译样本库现状与研究[J]. 测绘学报, 2021, 50(8): 1013-1022. DOI: . doi: 10.11947/j.AGCS.2021.20210085
	GONG Jianya, XU Yue, HU Xiangyun, et al. Status analysis and research of sample database for intelligent interpretation of remote sensing image[J]. Acta Geodaetica et Cartographica Sinica, 2021, 50(8): 1013-1022. DOI: . doi: 10.11947/j.AGCS.2021.20210085
[9]	ZHENG Z, ERMON S, KIM D, et al. Changen2: multi-temporal remote sensing generative change foundation model[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(2): 725-741.
[10]	ZAN Yujie, JI Shunping, CHAO Songtao, et al. Open-vocabulary generative vision-language models for creating a large-scale remote sensing change detection dataset[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2025, 225: 275-290.
[11]	ZHENG Zhuo, TIAN Shiqi, MA Ailong, et al. Scalable multi-temporal remote sensing change data generation via simulating stochastic change process[C]//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2024: 21761-21770.
[12]	XU Yonghao, YU Weikang, GHAMISI P, et al. Txt2Img-MHN: remote sensing image generation from text using modern Hopfield networks[J]. IEEE Transactions on Image Processing, 2023, 32: 5737-5750.
[13]	ZHAO Rui, SHI Zhenwei. Text-to-remote-sensing-image generation with structured generative adversarial networks[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19: 8010005.
[14]	LIU Yidan, YUE Jun, XIA Shaobo, et al. Diffusion models meet remote sensing: principles, methods, and perspectives[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4708322.
[15]	张新长, 赵元, 齐霁, 等. 基于AI大模型的文生图技术方法研究及应用[J]. 地球信息科学学报, 2025, 27(1): 10-26.
	ZHANG Xinchang, ZHAO Yuan, QI Ji, et al. Research and application of text-to-image technology based on Al foundation models[J]. Journal of Geo-information Science, 2025, 27(1): 10-26.
[16]	YUAN Zhiqiang, HAO Chongyang, ZHOU Ruixue, et al. Efficient and controllable remote sensing fake sample generation based on diffusion model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5615012.
[17]	XU Yue, LIU Honghao, YANG Ruixia, et al. Remote sensing image semantic segmentation sample generation using a decoupled latent diffusion framework[J]. Remote Sensing, 2025, 17(13): 2143.
[18]	BAGHIRLI O, ASKAROV H, IBRAHIMLI I, et al. SatDM: synthesizing realistic satellite image with semantic layout conditioning using diffusion models[EB/OL]. [2025-11-03]. http://arxiv.org/abs/2309.16812.
[19]	DONG Runmin, YUAN Shuai, FENG Litong, et al. Transferable image synthesis for remote sensing semantic segmentation via joint reference-semantic fusion[J]. Information Fusion, 2026, 127: 103839.
[20]	TANG Datao, CAO Xiangyong, WU Xuan, et al. AeroGen: enhancing remote sensing object detection with diffusion-driven data generation[C]//Proceedings of 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2025: 3614-3624.
[21]	ZHANG Mu, LIU Yunfan, LIU Yue, et al. CC-diff: spatially controllable text-to-image synthesis for remote sensing with enhanced contextual coherence[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5645316.
[22]	OU Ruizhe, YAN Haotian, WU Ming, et al. A method of efficient synthesizing post-disaster remote sensing image with diffusion model and LLM[C]//Proceedings of 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference. [S.l.]: IEEE, 2023: 1549-1555.
[23]	SEBAQ A, ELHELW M. RSDiff: remote sensing image generation from text using diffusion model[J]. Neural Computing and Applications, 2024, 36(36): 23103-23111.
[24]	LIU Chenyang, CHEN Keyan, ZHAO Rui, et al. Text2Earth: unlocking text-driven remote sensing image generation with a global-scale dataset and a foundation model[J]. IEEE Geoscience and Remote Sensing Magazine, 2025, 13(3): 238-259.
[25]	YU Zhiping, LIU Chenyang, LIU Liqin, et al. MetaEarth: a generative foundation model for global-scale remote sensing image generation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(3): 1764-1781.
[26]	KHANNA S, LIU P, ZHOU Linqi, et al. DiffusionSat: a generative foundation model for satellite imagery[EB/OL]. [2025-11-03]. https://arxiv.org/abs/2312.03606.
[27]	BAI Jinze, BAI Shuai, YANG Shusheng, et al. Qwen-VL: a versatile vision-language model for understanding, localization, text reading, and beyond[EB/OL]. [2025-11-03]. http://arxiv.org/abs/2308.12966.
[28]	LIU Shilong, ZENG Zhaoyang, REN Tianhe, et al. Grounding DINO: marrying DINO with grounded pre-training for open-set object detection[C]//Proceedings of Computer Vision-ECCV 2024. Cham: Springer, 2025: 38-55.
[29]	REN Tianhe, LIU Shilong, ZENG Ailing, et al. Grounded SAM: assembling open-world models for diverse visual tasks[EB/OL]. [2025-11-03]. http://arxiv.org/abs/2401.14159.
[30]	PEEBLES William, XIE Saining. Scalable diffusion models with transformers[C]//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 4172-4182.
[31]	HU E, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[C]//Proceedings of 2022 International Conference on Learning Representations. San Diego: OpenReview.net, 2022.
[32]	TANG Datao, CAO Xiangyong, HOU Xingsong, et al. CRS-diff: controllable remote sensing image generation with diffusion model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5638714.
[33]	HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]//Proceedings of 2017 Neural Information Processing Systems. Long Beach: Curran Associates, Inc., 2017: 6629-6640.
[34]	WANG Z, SIMONCELLI E P, BOVIK A C. Multiscale structural similarity for image quality assessment[C]//Proceedings of 2003 Asilomar Conference on Signals, Systems & Computers. Pacific Grove: IEEE, 2003: 1398-1402.
[35]	RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of 2021 International Conference on Machine Learning. San Diego: PMLR, 2021: 8748-8763.

方法	FID	MS-SSIM	CLIP Score
文本编码法	6.858 0	0.150 4	23.948 6
CRS-Diff	7.027 0	0.196 9	25.595 7
DiffusionSat	6.485 0	0.167 0	25.587 7
Text2Earth	13.523 4	0.189 9	24.787 2
本文方法	3.457 2	0.225 8	25.770 5

模块	FID	MS-SSIM	CLIP Score
不使用独立投影模块	4.860 4	0.171 6	25.770 2
使用独立投影模块	3.457 2	0.225 8	25.770 5

编码策略	FID	MS-SSIM	CLIP Score
标准MLP编码	4.530 6	0.211 5	25.736 5
异构频率编码	3.457 2	0.225 8	25.770 5

注意力机制	FID	MS-SSIM	CLIP Score
标准交叉注意力	7.848 5	0.213 3	25.345 5
解耦注意力	3.457 2	0.225 8	25.770 5

适配秩	FID	MS-SSIM	CLIP-Score
128	8.161 4	0.186 1	25.564 0
256	6.567 2	0.203 8	25.746 0
512	4.574 8	0.224 8	25.767 0
1024	3.457 2	0.225 8	25.770 5

时空信息显式引导的高分光学遥感影像可控生成方法

Controllable generation of high-resolution optical remote sensing image explicitly guided by spatio-temporal information

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 19

参考文献 35

相关文章 15

编辑推荐

Metrics

本文评价

[1]	尉锐, 李杰, 刘汇慧, 吴美茹, 林镠鹏, 袁强强, 郑莉. 面向洪水灾害的视觉-文本协同表征的异质遥感变化检测方法[J]. 测绘学报, 2026, 55(5): 927-940.
[2]	吴岚昕, 彭江涛, 孙伟伟, 杨冰. 面向海岸带湿地高光谱遥感的欧拉映射与互补特征建模变化检测方法[J]. 测绘学报, 2026, 55(4): 618-631.
[3]	王家耀, 陈琳, 程士源, 王利军, 熊思奇. 人工智能赋能地图科学数智化[J]. 测绘学报, 2026, 55(3): 381-389.
[4]	禄小敏, 张志义, 闫浩文, 何毅, 苏小宁. 融合深度图信息最大化和多层感知机的建筑物群组模式识别方法[J]. 测绘学报, 2026, 55(3): 425-438.
[5]	党宇, 朱建军, 付海强, 赵海涛, 陈海鹏. 扩散特征约束的小样本光学遥感异常检测方法[J]. 测绘学报, 2026, 55(1): 114-123.
[6]	季顺平, 刘瑾, 高建, 龚健雅. 多视影像深度学习密集匹配三维重建智能框架[J]. 测绘学报, 2025, 54(9): 1633-1646.
[7]	张继贤, 顾海燕, 倪欢, 李海涛, 杨懿, 丁少鹏, 隋淞蔓. 遥感智能变化检测的深度学习方法：演变与发展趋势[J]. 测绘学报, 2025, 54(8): 1347-1370.
[8]	方帅, 刘加恩, 张晶. 自适应参考特征引入与多尺度特征聚合的时空融合算法[J]. 测绘学报, 2025, 54(8): 1476-1488.
[9]	谢亚坤, 赵耀纪, 涂佳星, 夏瑞丰, 冯德俊, 刘苏凝, 陈虹宇, 朱军. 融合边缘与全局特征的遥感影像显著性目标检测方法[J]. 测绘学报, 2025, 54(7): 1265-1279.
[10]	孟妮娜, 李凤梅, 周校东. 数据与认知双驱动的建筑物群制图综合结果与尺度一致性识别[J]. 测绘学报, 2025, 54(7): 1318-1331.
[11]	王亚青, 王中辉. 异构图卷积网络支持下的河系自动选取方法[J]. 测绘学报, 2025, 54(7): 1332-1345.
[12]	董子博, 王竞雪, 卜丽静, 房琳, 许峥辉. MAFNet：基于多尺度空洞融合网络的遥感影像建筑物提取方法[J]. 测绘学报, 2025, 54(6): 1094-1106.
[13]	安晓亚, 郭伟茹, 张鹏鑫, 李欣欣, 石磊. 顾及几何位置和移动特征相似性的船舶轨迹聚类方法[J]. 测绘学报, 2025, 54(6): 1107-1121.
[14]	李海峰, 郭旺, 吴梦伟, 彭程里, 朱庆, 刘瑜, 陶超. 视觉-语言联合的遥感地物概念表达与智能解译：原理、挑战与机遇[J]. 测绘学报, 2025, 54(5): 853-872.
[15]	王超, 陈天宇, 张同, AhmedTanvir, 纪立强, 谢涛, 杨佳俊, 王帅. 基于全局差分增强模块和平衡惩罚损失的多源光学遥感影像变化检测[J]. 测绘学报, 2025, 54(5): 873-887.