Research “Hylight”: Evaluating model performance: towards a non-parametric variant of the Kling-Gupta efficiency by Pool, Vis & Seibert (2018)

Sandra_PoolLast year, Sandra Pool and colleagues published a technical paper in Hydrological Sciences Journal that proposing a modification of the Kling-Gupta efficiency towards a non-parametric metric. We thought it was an interesting choice of topic and went to ask her a few questions about the paper.

Where are you from, where are you based, and what are you working on now?
I’m from Switzerland and work at EAWAG, the Swiss Federal Institute of Aquatic Science and Technology as a post-doc. I completed my PhD studies at the University of Zurich, in the Hydrology and Climate group with Jan Seibert as my main supervisor. The main focus during my PhD was on the value of data for hydrological modelling, which includes model evaluation criteria. I’m currently researching the effect of irrigation modernization on groundwater recharge: plot scale studies have shown that drip irrigation is more water-efficient than flood irrigation, but at the catchment scale this effect is less clear. I’m trying to understand why we get different results at different spatial scales.

Your post-doc work seems very different from the things you did during your PhD.
Maybe, but I’d say that it is still modelling different components of the water balance, only now I’m paying more attention to groundwater instead of streamflow. I do use a different type of model though, because in this work vegetation and irrigation are essential and these were less important in my earlier work. This topic also gives me the opportunity to gain some insights into socio-hydrology, because irrigation efficiency and human behaviour are strongly linked to each other.

In a nutshell, how did you modify the Kling-Gupta efficiency?
The Kling-Gupta efficiency (RKGE) is a multi-objective function based on the decomposition of the mean squared error into three error terms, namely the error in the mean flow, the flow variability, and the flow dynamics (see Gupta et al. (2009) and Murphy (1988) for details on the decomposition). The three error terms are represented by the mean, the standard deviation, and the Pearson correlation. In our study, we propose a modification towards a non-parametric calibration criterion (RNP) by using the Spearman rank correlation to evaluate the error in streamflow dynamics, and the scaled flow duration curve to evaluate streamflow variability.

What is the take home message of your paper?
In a general sense, efficiency metrics are at the heart of every modelling study. Every modeller needs to select one or more efficiency metrics to quantify how well their model is doing. Our paper shows why it is important to think deeply about the choice of efficiency metrics. In a more specific sense, the paper shows that using non-parametric efficiency metrics is a valid alternative to ‘traditional’ metrics. We might assume that with non-parametric metrics we lose information about our modelled time series, but this seems not to be the case. In fact, it seems that by using non-parametric efficiency metrics we can get more hydrologically meaningful simulations as a result. Additionally, our proposed modification (RNP) is less sensitive to high flows and outliers due to the non-parametric assumptions. This avoids the need for data transformations, such as log-transformations, which can have unexpected negative effects on evaluation metrics (Santos et al., 2018).

To clarify, what does non-parametric mean and why is this a desirable property in objective functions?
This has actually been the subject of some extended discussions between us. A colleague who is a statistician provided us with some valuable support. Apparently, within the statistical community no formal definition of non-parametric exists. Informally, we use it to mean that the metric does not rely on certain assumptions about the distribution of our variable of interest. RKGE is parametric, because it assumes that the variable of interest (i.e. streamflow) and its errors are normally distributed. However, in most cases streamflow and its simulation/measurement errors are not normally distributed. The assumption of normally distributed data (and also data linearity and absence of outliers) underlies many objective functions but is not often explicitly mentioned or addressed in hydrological studies. Our modification of the Kling-Gupta efficiency addresses this problem in part.

What motivated you to delve so deeply into the topic of objective functions?
It was the result of a practical problem really. We wanted to calibrate a model for several 100’s of catchments and that takes time. We know that the choice of efficiency metric will influence the results (especially when comparing these values across different hydro-climates) and we had to make a proper choice. The general tendency within the literature is that multi-objective criteria are a good idea, so RKGE seemed a good choice. However, we weren’t fully happy with it because it makes these strong assumptions about the properties of the distribution of streamflow values. Besides, a separate branch of the efficiency criteria literature nowadays focuses on metrics that move away from purely statistical measures and instead focus on the properties of the hydrograph that we’re interested in.

Our motivation for the adapted efficiency metric is a consequence of these two aspects. We replaced the Pearson (linear) correlation component of RKGE with the Spearman (rank) correlation. The goal of this correlation part of RKGE is to tell us how well the simulations mimic the dynamics of the observations. Both correlation metrics do this, but the Pearson correlation also measures the linearity of the two distributions. The Spearman correlation is less affected by outliers and thus less biased towards higher errors.

Next, the standard deviation component in RKGE is supposed to tell us something about the variability of flows. We replaced this with a normalized flow duration curve, because this contains information about the entire distribution of flows while the standard deviation only shows a minor part of that distribution.

Finally, the third part of the RKGE is the mean component. The non-parametric alternative to the mean is the median, but we decided to not change this part of the RKGE. The mean flow tells us something about the annual water balance magnitude, but the median doesn’t contain this information. Our goal was to use a hydrologically meaningful metric with more appropriate statistical assumptions, and with the mean component we had to make a compromise. We decided that keeping the information contained within the mean is more important than using a non-parametric alternative.

You’ve tested 11 different objective functions across 100 catchments. What are your recommendations for choosing an objective function for a given purpose?
That would probably require a full opinion paper, because I’d say that this is also very much a personal choice. It should primarily depend on your goal: are you focusing on low flows, high flows, general hydrograph behaviour? Your efficiency metric should make sure that your model behaves properly for this purpose. As a general recommendation, I’d say that it’s important to evaluate multiple metrics and to do that independently (i.e. calibrate on say RKGE or RNP but evaluate on the three components or any other independent metrics).

In your opinion, how can your objective function be further improved?
There is always more work to be done. Research questions change over time (for example, we used to want to reproduce hydrographs, now we need simulations in ungauged basins and want to do change impact assessments) and our metrics should change to reflect that. Our goal is always to apply models to periods/places where we don’t have data so metrics should be robust under changing conditions. A good way forward would be to include different types of data. Traditionally we focus on streamflow but (depending on data availability) much more can be done with groundwater/soil moisture. This would let us more meaningfully constrain model behaviour and increase our confidence in how well our models represent the catchments they are simulating (e.g. Kirchner, 2006).

[End] – Interviewed by Wouter Knoben

 This interview is part of the YHS Research “Hylights” series to showcase interesting and outstanding work by early career scientists. Selection criteria are not set in stone, but reasons to select work can include e.g. novelty and relevance of findings, fun of reading, unique collaborations, media coverage and generated controversy. Tips and comments can be sent to young hydrologicsociety(at)gmail(dot)com.


Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F. (2009). Decomposition of the mean squared error and NSE performance criteria: implications for improving hydrological modelling. Journal of Hydrology,377 (1–2), 80–91. doi:10.1016/j.jhydrol.2009.08.003

Kirchner, J. W. (2006), Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology, Water Resour. Res., 42, W03S04, doi:10.1029/2005WR004362

Murphy, A. H. (1988), Skill Scores Based on the Mean Square Error and Their Relationships to the Correlation Coefficient, Monthly Weather Review, 116, doi: 10.1175/1520-0493(1988)116%3C2417:SSBOTM%3E2.0.CO;2

Pool, S., Vis, M., and Seibert, J. (2018). Evaluating model performance: towards a non-parametric variant of the Kling-Gupta efficiency. Hydrological Sciences Journal, 63(13-14), 1941-1953.

Santos, L., Thirel, G. and Perrin, C. (2018), Technical note: Pitfalls in using log-transformed flows within the KGE criterion, Hydrology and Earth System Sciences, 22, doi: /10.5194/hess-22-4583-2018

This entry was posted in News, Research "Hylight" and tagged . Bookmark the permalink.