中山大学地理科学与规划学院,广东 广州 510006
黄伟钧(1998年生),男;研究方向:遥感技术与应用、统计模拟;E-mail:huangwj53@mail2.sysu.edu.cn
李文楷(1982年生),男;研究方向:遥感技术与应用、统计模拟;E-mail:liwenk3@mail.sysu.edu.cn
纸质出版日期:2023-07-25,
网络出版日期:2023-03-31,
收稿日期:2022-09-02,
录用日期:2022-11-08
扫 描 看 全 文
黄伟钧,李佳豪,刘子越等.基于PBLC算法的滑坡空间易发性分析[J].中山大学学报(自然科学版),2023,62(04):54-64.
HUANG Weijun,LI Jiahao,LIU Ziyue,et al.Spatial susceptibility analysis of landslide based on PBLC algorithm[J].Acta Scientiarum Naturalium Universitatis Sunyatseni,2023,62(04):54-64.
黄伟钧,李佳豪,刘子越等.基于PBLC算法的滑坡空间易发性分析[J].中山大学学报(自然科学版),2023,62(04):54-64. DOI: 10.13471/j.cnki.acta.snus.2022D065.
HUANG Weijun,LI Jiahao,LIU Ziyue,et al.Spatial susceptibility analysis of landslide based on PBLC algorithm[J].Acta Scientiarum Naturalium Universitatis Sunyatseni,2023,62(04):54-64. DOI: 10.13471/j.cnki.acta.snus.2022D065.
滑坡空间易发性统计模型的构建需要正样本(滑坡点)和负样本(非滑坡点)两类数据,但历史观测数据仅记录了正样本,而负样本的选取容易受到正样本污染,因为没有滑坡记录的地方也可能在过去或未来发生滑坡,从而导致模型的预测精度与稳定性受到影响。针对此问题,将前期提出的半监督学习算法PBLC(positive and background learning with constraints)应用于滑坡空间易发性分析,探讨其解决负样本污染问题的有效性。本文以粤东地区为研究区,选择高程、坡度、坡向、剖面曲率、距离道路最短距离、距离断层线最短距离、距水系最短距离、年平均降雨量、归一化植被指数和地理坐标共11个影响因子作为环境变量。结果表明,与传统的人工神经网络模型相比,基于PBLC算法的预测概率取值范围更为合理,预测结果更加稳定,且预测精度随背景样本数量增加而提高;粤东地区的滑坡灾害高易发区集中于北部和西南区域,坡度和高程是影响该地区滑坡易发性的主要因子。结果表明,半监督学习算法PBLC可以有效解决滑坡统计建模过程负样本污染的问题,提高模型预测精度。
Statistical modeling of landslide susceptibility requires both positive (landslide) and negative (non-landslide) samples, but historical records of landslides only contain information on positive data. Selecting negative samples from areas without historical landslide records is problematic because landslides could have occurred without being observed or will occur in the future. This problem is referred to as case-control sampling with contaminated controls, which will affect the predictive accuracy and robustness of statistical models. To address this problem, we propose applying a semi-supervised learning algorithm PBLC (positive and background learning with constraints) and investigate its effectiveness in landslide susceptibility modeling. Taking Eastern Guangdong Province as the study area, we select 11 environmental variables, including elevation, slope, aspect, profile curvature, the shortest distance from roads, the shortest distance from fault lines, the shortest distance from rivers, mean annual precipitation, normalized difference vegetation index, and spatial coordinates, to investigate the effectiveness of the PBLC algorithm. Experimental results show that traditional artificial neural network underestimates the probabilities of landslide occurrences, and the degree of underestimation is affected by the number of negative samples. By contrast, the predicted probabilities of landslide occurrences by PBLC are more accurate and robust. The predicted landslide susceptibility map by PBLC shows that the areas with high susceptibility class are concentrated in the northern and southwestern regions in Eastern Guangdong Province, and slope and elevation are two of the most important factors that affect landslide susceptibility in the study area. We conclude that the semisupervised learning algorithm PBLC is effective in addressing the case-control sampling with contaminated controls in landslide susceptibility modeling.
滑坡易发性带约束的正样本-背景学习人工神经网络未标记数据粤东地区
landslide susceptibilitypositive and background learning with constraintsartificial neural networkunlabeled dataEastern Guangdong Province
陈飞, 蔡超, 李小双, 等, 2020. 基于信息量与神经网络模型的滑坡易发性评价[J].岩石力学与工程学报, 39(S1): 2859-2870.
郭子正, 殷坤龙, 付圣, 等, 2019. 基于GIS与WOE-BP模型的滑坡易发性评价[J]. 地球科学, 44(12): 4299-4312.
李松林, 许强, 汤明高, 等, 2020. 三峡库区滑坡空间发育规律及其关键影响因子[J]. 地球科学, 45(1): 341-354.
林泽雨, 刘爱华, 2019. 广东地区滑坡灾害分布特征与预警措施分析[J].人民长江, 50(S1): 90-92.
刘坚, 李树林, 陈涛, 2018. 基于优化随机森林模型的滑坡易发性评价[J]. 武汉大学学报(信息科学版), 43(7): 1085-1091.
刘艺梁, 殷坤龙, 刘斌, 2010. 逻辑回归和人工神经网络模型在滑坡灾害空间预测中的应用[J]. 水文地质工程地质, 37(5): 92-96.
田春山, 刘希林, 汪佳, 2016. 基于CF和Logistic回归模型的广东省地质灾害易发性评价[J]. 水文地质工程地质, 43(6): 154-161+170.
王毅, 方志策, 牛瑞卿, 等, 2021. 基于深度学习的滑坡灾害易发性分析[J]. 地球信息科学学报,23(12): 2244-2260.
曾广建, 2022. 粤东地区降水量空间插值方法研究[J]. 广东水利水电, (4): 27-32.
BATAR A K, WATANABE T, 2021. Landslide susceptibility mapping and assessment using geospatial platforms and weights of evidence (WoE) method in the Indian Himalayan region: Recent developments, gaps, and future directions[J]. ISPRS Int J Geo Inf, 10(3): 114-141.
BRABB E E, 1991. The world landslide problem[J]. Episodes, 14(1): 52-61.
BREIMAN L, 2001. Random forests[J]. Mach Lang, 45(1): 5-32.
CASTELLI V, COVER T M, 1996. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter[J].IEEE Trans Inf Theory, 42(6): 2102-2117.
COSTACHE R, 2019. Flash-flood potential index mapping using weights of evidence, decision trees models and their novel hybrid ensemble[J]. Stoch Environ Res Risk Assess, 33(7): 1375-1402.
DAVIS J, GOADRICH M, 2006. The relationship between precision-recall and ROC curves[C]//Proceedings of the 23rd International Conference on Machine learning. USA:233-240.
ELKAN C, NOTO K, 2008. Learning classifiers from only positive and unlabeled data[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. USA:213-220.
GOYES-PEÑAFIEL P, HERNANDEZ-ROJAS A, 2021. Landslide susceptibility index based on the integration of logistic regression and weights of evidence: A case study in Popayan, Colombia[J]. Eng Geol,280:105958-105966.
HASTIE T, FITHIAN W, 2013. Inference from presence‐only data; the ongoing controversy[J]. Ecography, 36(8): 864-867.
HU Q, ZHOU Y, WANG S, et al, 2019. Improving the accuracy of landslide detection in “off-site” area by machine learning model portability comparison: a case study of Jiuzhaigou earthquake, China[J]. Remote Sens,11(21): 2530-2550.
JIMÉNEZ‐VALVERDE A, 2012. Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling[J]. Glob Ecol Biogeogr, 21(4): 498-507.
LEE W S, LIU B, 2003. Learning with positive and unlabeled examples using weighted logistic regression[C]//Proceedings of the Twentieth International Conference(ICML 2003).USA: 448-455.
LI W, GUO Q, ELKAN C,2011. Can we model the probability of presence of species without absence data?[J]. Ecography,34(6):1096-1105.
LI W,GUO Q,ELKAN C,2020.One-class remote sensing classification from positive and unlabeled background data[J]. IEEE J Sel Top Appl Earth Obs Remote Sens,14: 730-746.
LI W, GUO Q, 2013. How to assess the prediction accuracy of species presence-absence models without absence data?[J]. Ecography, 36(7): 788-799.
LI W, GUO Q, 2021. Plotting receiver operating characteristic and precision-recall curves from presence and background data[J]. Ecol Evol, 11(15): 10192-10206.
LIU B, DAI Y, LI X, et al, 2003. Building text classifiers using positive and unlabeled examples[C]//Third IEEE International Conference on Data Mining. USA: 179-186.
LOBO J M, JIMÉNEZ-VALVERDE A, REAL R, 2008. AUC: A misleading measure of the performance of predictive distribution models[J]. Global Ecol and Biogeography, 17(2): 145-151.
LUCCHESE L V, DE OLIVEIRA G G, PEDROLLO O C, 2021. Investigation of the influence of nonoccurrence sampling on landslide susceptibility assessment using artificial neural networks[J].CATENA,198:105067-105077.
PETERSON A T, PAPEŞ M, SOBERÓN J, 2008. Rethinking receiver operating characteristic analysis applications in ecological niche modeling[J]. Ecol Model, 213(1): 63-72.
REICHENBACH P, ROSSI M, MALAMUD B D, et al, 2018.A review of statistically-based landslide susceptibility models[J]. Earth Sci Rev, 180: 60-91.
RICHARD M D, LIPPMANN R P, 1991. Neural network classifiers estimate Bayesian a posteriori probabilities[J]. Neural Comput, 3(4): 461-483.
SCHILIRÒ L, MONTRASIO L, MUGNOZZA G S, 2016. Prediction of shallow landslide occurrence: Validation of a physically-based approach through a real case study[J]. Sci Total Environ, 569/570: 134-144.
SIFA S F, MAHMUD T, TARIN M A, et al, 2020. Event-based landslide susceptibility mapping using weights of evidence (WoE) and modified frequency ratio (MFR) model:A case study of Rangamati district in Bangladesh[J]. Geolo Ecol Landsc, 4(3): 222-235.
SOFAER H R, HOETING J A, JARNEVICH C S, 2019. The area under the precision-recall curve as a performance metric for rare binary events[J]. Methods Ecol Evol,10(4): 565-577.
SOKOLOVA M, JAPKOWICZ N, SZPAKOWICZ S, 2006. Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation[M]//Lecture Notes in Computer Science. Berlin, Heidelberg: Springer:1015-1021.
Van RIJSBERGEN C J, 1979. Information retrieval[M].2nd ed. London: Butterworths.
WARD G, HASTIE T, BARRY S, et al, 2009. Presence‐only data and the EM algorithm[J]. Biometrics,65(2): 554-563.
YILMAZ I, 2010. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey:Conditional probability,logistic regression,artificial neural networks, and support vector machine[J]. Environ Earth Sci,61(4): 821-836.
0
浏览量
3
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构