高效通信的在轨分布式高光谱图像处理

doi:10.11947/j.AGCS.2024.20230594

摘要/Abstract

摘要：

随着在轨遥感卫星数量的增加及高光谱成像技术的进步，能够获取到的高光谱数据量急剧增长，人类步入了大数据应用和数据驱动的科学发现时代。然而，如此大体量、大幅宽的高光谱数据导致了深度学习算法难以在单节点上学习和推理，为实时高效的信息智能解译带来了极大的挑战。因此，需要综合多星资源分布式协同解析，以解决分块处理带来的块效应。然而，协同处理必然伴随着信息的交互与传输，为进一步降低传输信息量，需要对梯度进行压缩，以缓解分布式学习中的通信瓶颈。本文综合探讨了多种主流的高效通信梯度压缩算法，特别关注其在通信受限的在轨环境下的优劣，并展望了梯度压缩的发展趋势。通过广泛的试验对比，本文全面评估了多种梯度压缩方法在高光谱图像处理中的表现，试验证明不同方法的适用性和性能差异，为未来在实际应用中选择最合适的梯度压缩方法提供了有力的参考。

关键词: 分布式学习, 梯度压缩, 高效通信, 高光谱图像, 在轨处理

Abstract:

In recent years, with the increase in the number of on-orbit remote sensing satellites and advancements in hyperspectral imaging technology, there has been a sharp rise in the volume of available hyperspectral data, marking an era of big data applications and data-driven scientific discoveries. However, this substantial and wide-ranging volume of hyperspectral data poses significant challenges for deep learning algorithms to learn and infer on a single node, hindering real-time and efficient intelligent interpretation of information. Therefore, there is a need for comprehensive multi-satellite resource distributed cooperative analysis to address the block effects caused by block processing. However, collaborative processing inherently involves information interaction and transmission, necessitating gradient compression to reduce the transmitted information further, thereby alleviating communication bottlenecks in distributed learning. This paper comprehensively discusses various mainstream efficient communication gradient compression algorithms, specifically focusing on their pros and cons in communication-constrained on-orbit environments, and provides insights into the developmental trends of gradient compression. Through extensive experimental comparisons, we comprehensively evaluate the performance of various gradient compression methods in hyperspectral image processing. These experiments demonstrate the applicability and performance differences of different methods, providing robust references for selecting the most suitable gradient compression methods in practical applications in the future.

Key words: distributed learning, gradient compression, efficient-communication, hyperspectral image, on-orbit processing

中图分类号:

P237

谢卫莹, 王子璇, 李云松. 高效通信的在轨分布式高光谱图像处理[J]. 测绘学报, 2024, 53(4): 589-598.

Weiying XIE, Zixuan WANG, Yunsong LI. Efficient-communication on-orbit distributed hyperspectral image processing[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(4): 589-598.

图/表 8

图1

图2

表1

表2

图3

表3

表4

图4

参考文献 45

[1]	VERAVERBEKE S, DENNISON P, GITAS I, et al. Hyperspectral remote sensing of fire: state-of-the-art and future perspectives[J]. Remote Sensing of Environment, 2018, 216:105-121.
[2]	赵文波. “中国高分” 科技重大专项在对地观测发展历程中的阶段研究[J]. 遥感学报, 2019, 23(6):1036-1045.
	ZHAO Wenbo. Phase research and practice of upgrading earth observation from test application to system effectiveness in China[J]. Journal of Remote Sensing, 2019, 23(6):1036-1045.
[3]	HAUT J M, PAOLETTI M E, MORENO-ÁLVAREZ S, et al. Distributed deep learning for remote sensing data interpretation[J]. Proceedings of the IEEE, 2021, 109(8):1320-1349.
[4]	BI Qi, QIN Kun, ZHANG Han, et al. APDC-net: attention pooling-based convolutional network for aerial scene classification[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(9):1603-1607.
[5]	ZHU R, YAN L, MO N, et al. Attention-based deep feature fusion for the scene classification of high-resolution remote sensing images[J]. Remote Sensing, 2019, 11(17):1996.
[6]	ZHANG Wei, TANG Ping, ZHAO Lijun. Remote sensing image scene classification using CNN-CapsNet[J]. Remote Sensing, 2019, 11(5):494.
[7]	XU Kejie, HUANG Hong, LI Yuan, et al. Multilayer feature fusion network for scene classification in remote sensing[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(11):1894-1898.
[8]	WANG Zeqin, WEN Ming, XU Yuedong, et al. Communication compression techniques in distributed deep learning: a survey[J]. Journal of Systems Architecture, 2023, 142:102927.
[9]	SEIDE F, FU Hao, DROPPO J, et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs[C]//Proceedings of the 5th annual conference of the international speech communication association. Singapore: ACM Press, 2014.
[10]	BERNSTEIN J, WANG Y X, AZIZZADENESHELI K, et al. SignSGD: compressed optimisation for non-convex problems[C]//Proceedings of 2018 International Conference on Machine Learning. New York: PMLR, 2018: 560-569.
[11]	WEN Wei, XU Cong, YAN Feng, et al. TernGrad: ternary gradients to reduce communication in distributed deep learning[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM Press, 2017: 1508-1518.
[12]	ALISTARH D, GRUBIC D, LI J Z, et al. QSGD: communication-efficient SGD via gradient quantization and encoding[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM Press, 2017: 1707-1718.
[13]	ZHENG S, HUANG Z, KWOK J. Communication-efficient distributed blockwise momentum SGD with error-feedback[C]//Proceedings of 2019 Advances in Neural Information Processing Systems. Vancouver: Curran Associates, Inc, 2019.
[14]	KARIMIREDDY S P, REBJOCK Q, STICH S, et al. Error feedback fixes signsgd and other gradient compression schemes[C]//Proceedings of 2019 International Conference on Machine Learning. California: PMLR, 2019: 3252-3261.
[15]	FAGHRI F, TABRIZIAN I, MARKOV I, et al. Adaptive gradient quantization for data-parallel SGD[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver: ACM Press, 2020: 3174-3185.
[16]	GUO J, LIU W, WANG W, et al. Accelerating distributed deep learning by adaptive gradient quantization[C]//Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona: IEEE, 2020: 1603-1607.
[17]	SONG L, ZHAO K, PAN P, et al. Communication efficient SGD via gradient sampling with Bayes prior[C]//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021.
[18]	LIN Yujun, HAN Song, MAO Huizi, et al. Deep gradient compression: reducing the communication bandwidth for distributed training[C]//Proceedings of 2018 International Conference on Learning Representations. Vancouver: OpenReview.net, 2018: 2458-2467.
[19]	STICH S U, CORDONNIER J B, JAGGI M. Sparsified SGD with memory[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: ACM Press, 2018: 4452-4463.
[20]	ALHAM F A, KENNETH H. Sparse communication for distributed gradient descent [C]//Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen: Association for Computational Linguistics, 2017: 440-445.
[21]	CORMEN T H, LEISERSON C E, RIVEST R L, et al. Introduction to algorithms[M]. Cambridge: MIT Press, 2022.
[22]	DUTTA A, BERGOU E H, ABDELMONIEM A M, et al. On the discrepancy between the theoretical analysis and practical implementations of compressed communication for distributed deep learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(4):3817-3824.
[23]	WANG Huazheng, KIM S, MCCORD-SNOOK E, et al. Variance reduction in gradient exploration for online learning to rank[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Paris: ACM Press, 2019: 835-844.
[24]	BOUNEFFOUF D. Exponentiated gradient exploration foractive learning[J]. Computers, 2016, 5(1):1.
[25]	ABRAHAMYAN L, CHEN Y, BEKOULIS G, et al. Learned gradient compression for distributed deep learning[J]. IEEE Trans Neural Netw Learn Syst, 2022, 33(12):7330-7344.
[26]	TAO Zeyi, XIA Qi, LI Qun, et al. CE-SGD: communication-efficient distributed machine learning[C]//Proceedings of 2021 IEEE Global Communications Conference. Madrid: IEEE, 2021: 1-7.
[27]	IVKIN N, ROTHCHILD D, ULLAH E, et al. Communication-efficient distributed SGD with sketching[J]. Advances in Neural Information Processing Systems, 2019, 32.
[28]	BASU D, DATA D, KARAKUS C, et al. Qsparse-local-SGD: distributed SGD with quantization, sparsification and local computations[J]. Advances in Neural Information Processing Systems, 2019, 32.
[29]	LU Qu, LIU Wantao, HAN Jizhong, et al. Multi-stage gradient compression: overcoming the communication bottleneck in distributed deep learning[C]//Proceedings of 2018 International Conference on Neural Information Processing. Cham: Springer, 2018: 107-119.
[30]	SATTLER F, WIEDEMANN S, MULLER K R, et al. Sparse binary compression: towards distributed deep learning with minimal communication[C]//Proceedings of 2019 International Joint Conference on Neural Networks. Budapest: IEEE, 2019: 1-8.
[31]	HU Kaifan, WU Chengkun, ZHU En. HGC: hybrid gradient compression in distributed deep learning[M]//Advances in Artificial Intelligence and Security. Cham: Springer International Publishing, 2021: 15-27.
[32]	SHAH S M, LAU V K N. Model compression for communication efficient federated learning [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34:5937-5951.
[33]	YAN Guangfeng, LI Tan, HUANG Shaolun, et al. AC-SGD: adaptively compressed SGD for communication-efficient distributed learning[J]. IEEE Journal on Selected Areas in Communications, 2022, 40(9):2678-2693.
[34]	MORDIDO G, VAN KEIRSBILCK M, KELLER A. Monte carlo gradient quantization[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle: IEEE. 2020: 718-719.
[35]	WANG Hongyi, SIEVERT S, CHARLES Z, et al. ATOMO: communication-efficient learning via atomic sparsification[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: ACM Press, 2018: 9872-9883.
[36]	YU Mingchao, LIN Zhifeng, NARRA K, et al. GradiVeQ: vector quantization for bandwidth-efficient gradient aggregation in distributed CNN training[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: ACM Press, 2018: 5129-5139.
[37]	VOGELS T, KARIMIREDDY S P, JAGGI M. PowerSGD: practical low-rank gradient compression for distributed optimization[J]. Advances in Neural Information Processing Systems, 2019, 32.
[38]	YURTSEVER A, UDELL M, TROPP J, et al. Sketchy decisions: convex low-rank matrix optimization with optimal storage[C]//Proceedings of 2017 Artificial Intelligence and Statistics. Lauderdale: PMLR. 2017.
[39]	GUNASEKAR S, LEE J, SOUDRY D, et al. Characterizing implicit bias in terms of optimization geometry[C]//Proceedings of 2018 International Conference on Machine Learning. New York: PMLR, 2018: 1832-1841.
[40]	LI Daixun, XIE Weiying, LI Yunsong, et al. FedFusion: manifold-driven federated learning for multi-satellite and multi-modality fusion[J]. IEEE Transactions onGeoscience and Remote Sensing, 2024, 62:1-13.
[41]	HONG Danfeng, HAN Zhu, YAO Jing, et al. SpectralFormer: rethinking hyperspectral image classification with transformers[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:3130716.
[42]	DING Kexing, LU Ting, FU Wei, et al. Global-local transformer network for HSI and LiDAR data joint classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:3216319.
[43]	DENBY B, LUCIA B. Orbital edge computing: nanosatellite constellations as a new class of computer system[C]//Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. Lausanne: ACM Press, 2020: 939-954.
[44]	ROY S K, DERIA A, HONG Danfeng, et al.Multimodal fusion transformer for remote sensing image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61:3286826.
[45]	SHEN Dunbin, MA Xiaorui, KONG Wenfeng, et al. Hyperspectral target detection based on interpretable representation network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61:3302950.

编辑推荐 0

Metrics

阅读次数

全文

376

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	82	0	0	294

来源	本网站	其他网站

次数	376	0
比例	100%	0%

摘要

318

最新录用	在线预览	正式出版

0	0	318

	来源	本网站

	次数	318
	比例	100%

方法	OA	AA	Kappa系数	压缩比
SGD	0.812 2	0.884 3	0.786 8	1×
QSGD	0.798 6	0.878 6	0.770 9	8×
signSGD	0.769 7	0.851 2	0.739 4	32×
DGC	0.777 2	0.845 2	0.746 7	32×
DGC	0.708 3	0.820 6	0.670 9	1000×
AC-SGD	0.786 3	0.859 4	0.757 6	32×
AC-SGD	0.751 7	0.843 6	0.718 7	1000×
MCGQ	0.812 3	0.880 1	0.787 2	32×
MCGQ	0.702 2	0.807 7	0.663 8	1000×

方法	OA	AA	Kappa系数	压缩比
SGD	0.933 8	0.935 8	0.928 1	1×
QSGD	0.945 2	0.945 7	0.940 5	8×
signSGD	0.928 5	0.934 0	0.922 4	32×
DGC	0.929 2	0.935 1	0.928 3	32×
DGC	0.927 4	0.936 2	0.921 2	1000×
AC-SGD	0.930 3	0.938 1	0.924 3	32×
AC-SGD	0.907 5	0.915 0	0.899 6	1000×
MCGQ	0.935 1	0.933 9	0.929 5	32×
MCGQ	0.919 7	0.927 7	0.911 6	1000×

方法	OA	AA	Kappa系数	压缩比
SGD	0.904 9	0.603 6	0.862 6	1×
QSGD	0.901 2	0.601 1	0.859 6	8×
signSGD	0.884 6	0.579 6	0.840 1	32×
DGC	0.887 5	0.587 8	0.849 8	32×
DGC	0.876 9	0.572 4	0.834 6	1000×
AC-SGD	0.890 2	0.588 9	0.847 6	32×
AC-SGD	0.868 7	0.554 7	0.817 7	1000×
MCGQ	0.897 5	0.584 6	0.847 8	32×
MCGQ	0.876 7	0.567 8	0.824 1	1000×

方法	AUC	压缩比
SGD	0.998 3	1×
QSGD	0.996 1	8×
signSGD	0.972 7	32×
DGC	0.996 8	32×
DGC	0.991 5	1000×
AC-SGD	0.997 5	32×
AC-SGD	0.9968	1000×
MCGQ	0.991 6	32×
MCGQ	0.990 4	1000×