BAYESIAN REGULARIZED NEURAL NETWORKS APPROACH AND UNCERTAINTY ANALYSIS FOR REFERENCE EVAPOTRANSPIRATION MODELING ON SEMIARID AGROECOSYSTEMS

The Penman–Monteith equation (PM) is widely recommended by The Food and Agriculture Organization (FAO) as the method to calculate reference evapotranspiration (ET0). However, the detailed climatological data required by the PM are not often available. The present study aimed to develop bayesian regularized neural networks (BRNN)-based ET0 models and compare its results with the PM approach. Forteen weather stations were selected for this study,located in Juazeiro (BA) and Petrolina (PE) counties, Brazil. BRNN were trained with different parameters choices and obtained R2 between 0.96 and 0.99 during training and between 0.95 and 0.98 with validation dataset. Root mean squared error (RMSE) less than 0.10 mm.day for BRNN when compared to PM denoted the good performance of the network using only air temperature, solar radiation and wind speed at average daily scale as input variable. Epistemic and random uncertainties were evaluated and precipitation was identified as the variable with the greatest uncertainty, being therefore discarded for modeling.


INTRODUCTION
According to KUMAR et al. (2002), evapotranspiration is a complex and nonlinear phenomenon, because it depends on the interaction of several climatic elements as solar radiation, wind speed, air humidity, and temperature, as well as on the type and growth stage of the crop. According to PEREIRA et al. (2002), the selection of a method for estimating the evapotranspiration depends on several factors.
One of these factors is the availability of meteorological data, as the complex methods requiring a high number of variables have applicability only when all necessary data are available. When there is availability of data, ALLEN et al. (1998) recommend the application of the Penman-Monteith (PM) as the sole standard method for the definition and computation of the reference evapotranspiration (ETo). Although the meteorological variables necessary for the application of the PM method are not always universally available, in particular those related to the solution of the aerodynamic term, wind speed and the deficit of water vapor pressure in the air. So, the methods for estimating ETo as a function of the climatic elements that might be obtained on a more practical way, such as the air temperature and the extraterrestrial radiation, are very important. A tool that can be used to estimate ETo is the artificial neural network (ANN).
According HAYKIN (1999) an Artificial Neural Network (ANN) is a popular statistical method which can explore the relationships between variables with high accuracy. Essentially, the structure of an ANN is computer-based and consists of several simple processing elements operating in parallel. An ANN consists of three layers: input, hidden, and output layers, hence it is referred to as a three-layer network. The input layer contains independent variables that are connected to the hidden layer for processing. The hidden layer contains activation functions and it calculates the weights of the variables in order to explore the effects of predictors upon the target (dependent) variables. In the output layer, the prediction or classification process is ended and the results are presented with a small estimation error.
In ANNs, some regularization techniques are used with the backpropagation training algorithm to obtain a small error. This causes the network response to be smoother and less likely to overfit to training patterns (HAYKIN, 1999). However, the backpropagation algorithm is slow to converge and may cause an overfitting problem. Backpropagation algorithms that can converge faster have been developed to overcome the convergence issue. Similarly, some regularization methods have been developed to solve the overfitting problem in ANNs. Among regularization techniques, Levenberg-Marquardt (LM) and Bayesian Regularization (BR) are able to obtain lower mean squared errors than any other algorithms for functioning approximation problems (HAGAN, MENHA, 1994). LM was especially developed for faster convergence in backpropagation algorithms. Essentially, BR has an objective function that includes a residual sum of squares and the sum of squared weights to minimize estimation errors and to achieve a good generalized model. Evapotranspiration modeling using ANN has received much attention in the recent years. In order to estimate reference evapotranspiration in the state of Rio de Janeiro, ZANETTI et al. (2008) used a neural network considering geographic coordinates and air temperature. ALVES SOBRINHO et al. (2011) developed an ANN to estimating ETo through data of daily air temperature for the region of Mato Grosso do Sul and the neural network obtained the best adjustment, compared with the conventional methods.
For example, Abedi-Koupai et al. (2009) used two hidden layers with five neurons, each one with four input values, one output layer and log-sigmoid function, and obtained coefficient of determination of 0.95 for reference evapotranspiration in protected environment.
In this paper we applied bayesian regularized neural networks (BRNN) to simulate PM-based reference evapotranspiration with less variables than the original PM formulation in a semiarid area from Brazil and evaluate epistemic and aleatoric uncertainties between predicted and original values. Figure 1 shows the location of the reference semiarid area (dashed red square on the right side) inside the Petrolina County, Pernambuco state, Northeast of Brazil, together with the net of forty agrometeorological stations (blue arrows) used for the weather data interpolation processes in a geographic information system (GIS) environment. Agrometeorological stations monitored solar radiation (RG), air average temperature (T_med), relatively humidity (RH_med), wind speed (W) and reference evapotranspiration (ET0). The reference evapotranspiration is the evapotranspiration referring to a hypothetical crop that completely covers the soil, is in active growth, does not present water and nutritional restriction, and presents specific characteristics such as albedo equal to 0.23 and height between 8 and 15 cm. Among the various methods of ETo estimation, the Penman-Monteith, presented by the FAO, is recommended as the standard, according to Equation 1. (1)

MATERIAL AND METHODS
where Δ (kPa C -1 ) is the slope of the saturated vapor pressure curve, γ is the psychrometric constant (kPa °C -1 ), T is the daily average air temperature, e a is the actual water vapor pressure of the air (kPa), e s is the saturated water vapor pressure (kPa), (e s − e a ) (kPa) is the vapor pressure deficit in the air near the vegetated surfaces, R n is the net radiation and G is the soil heat flux.
A neural network is formed by simple elements operating in parallel. Inspired by a biological neural network, the neural network receives its independent neurons in its input. The variables are passed to subsequent layers of neurons, where, passing through a transfer function, the weighted sum of input values is calculated, providing an output for the neuron in analysis (WANG et al., 2017). The bayesian regularized neural networks (BRNN) are more robust than the networks that use the back propagation of the errors, besides avoiding the over-fitting of the model (TICKNOR, 2013). Regularization refers to limiting the scale of weights and thresholds to improve the generalization ability of the neural network. In other words, on the basis of the neural network error function MSE, a penalty term, which can approximate the complex function, is added, thus improving the neural network function as the following Equation 2.
where the square of the network weights is described as Equation 3 .
W i is the weight of the neural network connection; n is the total number of samples; E D is the sum of the residuals of the expected value and target value of the neural network; and α and β represent the regularization parameters that determine the training target of the neural network and control the degree of fit achieved.
Bayesian regularization takes the objective function of the traditional neural network model as a likelihood function. The regularizer corresponds to the prior probability distribution on the network weights, and the network weights are regarded as a random variable. A Bayesian regularization neural network refers to a forward neural network based on Bayesian regularization training. Using a hypothesized parameter probability distribution, this network learns in the whole weight space and evaluates relevant parameters.
It then adjusts the regularization parameter and performs adaptive adjustment of the regularization parameters using Bayesian inference based on the posterior distribution. According to the probability density of weights to determine the optimal weighting function, and under the premise of ensuring the smallest squared network error, the weights are minimized to provide effective control of network complexity and to improve network generalization ability. Bayesian regularization optimizes the fit of the neural network of the training samples and minimizes model complexity by improving the training performance function of the neural network.
The performance was also evaluated by its uncertainty. Two uncertainties types were retrieved. Aleatory uncertainty is an uncertainty class that comes from random processes and refers to the inherent uncertainty due to the probabilistic variability. Epistemic uncertainty is another uncertainty class that comes from the lack of knowledge. High epistemic uncertainty can be caused for example by simple models that try to fit complex functions with little or missing data. Theoretically, if the model were perfect, epistemic uncertainty would not exist (KENDALL, GAL, 2017). GAL and GHAHRAMANI (2016) showed that an ANN can be approximated to a Gaussian process and for this reason uncertainty estimates can be obtained by training a network with dropout and then, using dropout at test time too. When applying the test in the ANN, dropout provide Monte Carlo samples from the posterior, which is used to approximate the true posterior distribution.
R packages "keras" and "tensorflow" were applied to perform the uncertainty analysis. R package PerformanceAnalytics performed the correlation plot. R Finally, R package "brnn" were used to perform bayesian regularized neural networks modelling. This package doesn't show the iteration evolution of the modelling. The computer used in this research was an Intel® Core i5 with 8 GB of memory. Figure 2 shows the correlation plot of solar radiation (RG), air average temperature (T_med), relatively humidity (RH_med), wind speed (W) and reference evapotranspiration (ET0) values from 14 agrometeorological stations on Petrolina County, Pernambuco state, Northeast of Brazil. The distribution of each variable is shown on the diagonal. On the bottom of the diagonal the bivariate scatter plots with a fitted line are displayed. On the top of the diagonal the value of the correlation plus the significance level as stars. Each significance level is associated to a symbol as following: p-values between 0 and 0,001 is denoted by ***, between 0,001 and 0,01 is denoted by **, between 0,01 and 0,05 is denoted by *, between 0,05 and 0.1 is denoted by ., and between 0,1 and 1 no symbol is shown. No variable showed higher correlation than 0,70 with ET0, indicating that using only one variable as input gives poor estimates. Few reference evapotranspiration like MAKKINK (1957), TURC (1961), PRIESTLEY (1972, FAO-24 (DOORENBOS;PRUITT, 1984), HARGREAVES andSAMANI (1985), andBLANEY (1950), are based in multivariate regression equation using these variables as input data. The training data (as well as the validation data) were generated from a standard normal distribution; therefore, the model found many more examples close to the average than beyond two or even three standard deviations. That is why these peripheral regions have greater uncertainty. While epistemic uncertainty potentially finds its shortcomings, aleatoric uncertainty is irreducible. Precipitation, for example, shows higher epistemic uncertainties at its highest value since it has only been reached once. This indicates that this modeling fails for positive extreme values and would have greater uncertainty for large precipitations. Therefore, this variable was not considered in the simulation scenarios with BRNN. The aleatoric uncertainty is stable in most variables, except in precipitation, where it presents greater uncertainty in non-zero values.  To perform ANN models like the BRNN, the meteorological data were divided into a training set (here, 60% of the data) and a test set (40%). The training set is used to fit the model weights (for a number of different network configurations and training cycles), and the test set is used to evaluate the model against unseen data. The neural networks were trained with 2-15 neurons and after each training run, RMSE and R 2 were calculated using only the test data set to find the optimal number of hidden nodes.

RESULTS AND DISCUSSION
The air temperature varied from 21.8 to 26.5 °C, whereas ETo ranged from 1.6 to 7.8 mm. BRNN were trained with different parameters choices and obtained R² between 0.96 and 0.99 during training and between 0.95 and 0.98 with test dataset, and root mean squared error (RMSE) less than 0.10 mm.day -1 compared to PM. Table 1 show results for four parameters combination scenarios. Similar results were reported by KUMAR et al. (2008) comparing an ANN model with the methods of Hargreaves and Penman-Monteith (PM-56) for the estimation of reference evapotranspiration, with coefficient of determination of 0.90.   The good performance of the network using only air temperature, solar radiation and wind speed at average daily scale as input variable was confirmed. Since precipitation is a variable that is mostly null, it does not provide enough information on a daily scale for the model.
The success of neural networks is directly related to their great versatility and it makes them a very promising tool for decision taking. The selection of the parameters defined by the user also contributed to the optimal performance of the ANN in the estimation of reference evapotranspiration. It is important to point out that other network architectures or other parameters can also be applied for similar situations and that the proposed solution was selected to present the potential of application of the tool and its good performance.

CONCLUSIONS
Daily reference evapotranspiration calculated by Penman-Monteith can be simulated with less variables by a bayesian regularized neural networks with a great precision, showing high accuracy and using only air temperature, solar radiation and wind speed at average daily scale as input variable. Epistemic and random uncertainties in this modelling were evaluated and precipitation was identified as the variable with the greatest uncertainty, being therefore discarded for modeling.
The analysis of uncertainties and the BRNN modeling are connected in relation to the choice of input variables of the model, since it gives an understanding of the role of each variable in improving and worsening the results. Tools for analyzing model uncertainty require statistical and programming knowledge, but this analysis provides a solid basis for understanding model weaknesses and potentials.