02 Nov 2022
 | 02 Nov 2022
Status: this preprint is currently under review for the journal ESurf.

Short Communication: Evaluating the accuracy of binary classifiers for geomorphic applications

Matthew William Rossi

Abstract. Airborne lidar has revolutionized our ability to map out fine-scale (~1-m) topographic features at watershed- to landscape-scales. As our ‘vision’ of land surface has improved, so has our need for more robust quantification of the accuracy of the geomorphic maps we derive from these data. One broad class of mapping challenges is that of binary classification where remote sensing data are used to identify the presence or absence of a given feature. Fortunately, there are a large suite of metrics developed in the data sciences that are well suited to quantifying pixel-level accuracy of binary classifiers. In this paper, I focus on the challenge of identifying bedrock from lidar topography, though the insights gleaned from this analysis apply to any task where there is a need to quantify how the number and extent of landforms are expected to vary as a function of the environmental forcing. Using a suite of synthetic maps, I show how the most widely used pixel-level accuracy metric, F1-score, is particularly poorly suited to quantifying accuracy for this kind of application. Well-known biases to imbalanced data are exacerbated by methodological strategies that attempt to calibrate and validate classifiers across a range of geomorphic settings where feature abundances vary. Matthews Correlation Coefficient largely removes this bias such that the sensitivity of accuracy scores to geomorphic setting instead embeds information about the error structure of the classification. To this end, I examine how the scale of features (e.g., the typical sizes of bedrock outcrops) and the type of error (e.g., random versus systematic) manifest in pixel-level scores. The normalized version of Matthews Correlations Coefficient is relatively insensitive to feature scale if error is random and if large enough areas are mapped. In contrast, a strong sensitivity to feature size and shape emerges when classifier error is systematic. My findings highlight the importance of choosing appropriate pixel-level metrics when evaluating topographic surfaces where feature abundances strongly vary. It is necessary to understand how pixel-level metrics are expected to perform as a function of scene-level properties before interpreting empirical observations.

Matthew William Rossi

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on esurf-2022-51', Stuart Grieve, 05 Jan 2023
  • RC2: 'Comment on esurf-2022-51', Anonymous Referee #2, 21 Feb 2023
  • AC1: 'Comment on esurf-2022-51', Matthew Rossi, 21 Feb 2023

Matthew William Rossi

Data sets

Rossi et al. (2020) Bedrock Maps Rossi, Matthew W.

Model code and software

Synthetic Bedrock Mapping Code Rossi, Matthew W.

Matthew William Rossi


Total article views: 558 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
456 90 12 558 4 4
  • HTML: 456
  • PDF: 90
  • XML: 12
  • Total: 558
  • BibTeX: 4
  • EndNote: 4
Views and downloads (calculated since 02 Nov 2022)
Cumulative views and downloads (calculated since 02 Nov 2022)

Viewed (geographical distribution)

Total article views: 510 (including HTML, PDF, and XML) Thereof 510 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 25 Mar 2023
Short summary
High resolution topographic maps have revolutionized our ability to map small-scale features on Earth's surface. Using accuracy metrics from the data sciences, I show that caution is warranted in interpreting pixel-level accuracy scores without also considering scene-level properties. Choosing the right metric and understanding its larger-scale context is needed to properly interpret how good our maps actually are.