Incorporating Spatial Autocorrelation in Machine Learning Models Using Spatial Lag and Eigenvector Spatial Filtering Features

Xiaojian Liu, Ourania Kounadi, Raul Zurita-Milla

Veröffentlichungen: Beitrag in FachzeitschriftArtikelPeer Reviewed


Applications of machine-learning-based approaches in the geosciences have witnessed a substantial increase over the past few years. Here we present an approach that accounts for spatial autocorrelation by introducing spatial features to the models. In particular, we explore two types of spatial features, namely spatial lag and eigenvector spatial filtering (ESF). These features are used
within the widely used random forest (RF) method, and their effect is illustrated on two public datasets of varying sizes (the Meuse and California housing datasets). The least absolute shrinkage and selection operator (LASSO) is used to determine the best subset of spatial features, and nested cross-validation is used for hyper-parameter tuning and performance evaluation. We utilize Moran’s I and local indicators of spatial association (LISA) to assess how spatial autocorrelation is captured at both global and local scales. Our results show that RF models combined with either spatial lag or ESF features yield lower errors (up to 33% different) and reduce the global spatial autocorrelation of the residuals (up to 95% decrease in Moran’s I) compared to the RF model with no spatial features. The local autocorrelation patterns of the residuals are weakened as well. Compared to benchmark geographically weighted regression (GWR) models, the RF models with spatial features yielded more accurate models with similar levels of global and local autocorrelation in the prediction residuals. This study reveals the effectiveness of spatial features in capturing spatial autocorrelation and provides a generic machine-learning modeling workflow for spatial prediction.
FachzeitschriftISPRS International Journal of Geo-Information
PublikationsstatusVeröffentlicht - Apr. 2022

ÖFOS 2012

  • 507003 Geoinformatik
  • 507001 Angewandte Geographie
  • 102019 Machine Learning