A climate index proposal for the wine sector : a descriptive statistical approach

Understanding the role of the climate on the wine production is one of the major concerns of this sector since the environment usually determines the output of this industry. There are only a few previous studies that attempted to compile these environmental effects as an index, usually considering the temperature and the precipitation as their core variables. The present study suggests a new climate index which is based on descriptive statistics. Our index tries to mimic the target region characteristics and avoid the past studies premise of imposing previously conceived restrictions such as a fixed optimal climate. We then used yearly production and daily temperature data (1950-2016) from the Portuguese Minho wine region to test our proposed index and compare it with Ribéreau-Gayon and Peynaud (RGP, Ribéreau-Gayon et al., 2003) and Growing Degree-Days (GDD, Winkler et al., 1974) indexes. Our results showed that the newly proposed index may outperform the explanatory power of the other indexes and, in addition, may output interesting and unknown characteristics such as the different ideal temperatures regarding the studied region.


Introduction
The climate plays an important role on the wine production since its quality and quantity is strongly dependant on the inherent weather conditions as stated on Malheiro et al. [1] and Salinger et al. [2].That strong bond made the yearly grape quality become a feature included in the study of the effects of the climate change [3,4].Even though many individual climate factors make their contribution in the grape development and the following grape quality (e.g.solar radiation, wind, humidity, among others), the temperature and the water supply are among the most important [5].The vines are capable to grow in a wide range of climatic conditions, nonetheless the most remarkable wine-growing regions are located between the 35 th and the 50 th parallels in the Northern Hemisphere and between the 30 th and the 45 th parallels in the Southern Hemisphere [6].During the growing season, the extreme high temperatures can affect the grape quality and the vineyard overall production in many ways, inducing high grape mortality or total failure of flavour ripening [7].In the opposite way, low and frosting temperatures can also harm the grape flavour and colour due to the slower grape fermentation.In the worst-case scenario extreme cold can also cause the total wipe-out of the vineyard production.Summarily, Gladstones [8] suggests that the grape growing process usually requires mild and narrow ranged temperatures (avoiding the extreme heat or the extreme cold) and actually, according to Gladstones [9] and Jones [10] the most remarkable quality wines are associated with low frost damage and mild winters (January, February, March), early flowering, and warm springs (April, May, June) and optimal maturation associated with short range variability in the summer temperature (July, August, September).The temperature has such a heavy impact on the grape growing process that the different vine stages can be predicted by simple single based temperature models [11].The amount of the precipitation that a vineyard receives is also an important feature that determines the following grape quality and output.During the grape E3S Web of Conferences 50, 01028 (2018) https://doi.org/10.1051/e3sconf/20185001028XII Congreso Internacional Terroir Web of Conferences maturation period a vineyard should only be exposed to a certain amount of precipitation since its excess may cause major quality issues, including disease susceptibility and rotten grapes.The lack of precipitation in every non-irrigated farm may also cause production problems such as shrinking berries, and of course vineyard mortality and low production levels [12].Nonetheless, a moderate water deficit may reduce the grape yield but improve the grape quality [13,14].This work gathers information about the climate impact on the wine production and presents an overview of the current bibliography about the existent wine-oriented climate indexes.Afterwards a new climate index based on descriptive statistical parameters is created and evaluated using a simple case study of the Portuguese Minho wine production.In order to test the proposed index, we compare its explanatory power upon the Ribéreau-Gayon and Peynaud (RGP) [15] and Growing Degree-Days (GDD) [16] indexes.The article is divided in four sections.Besides the overview of environmental effects on the vineyard production, the next section introduces the notion of climate index, with a few examples from the available bibliography, and presents in more detail two indexes (GDD and RGP indexes).The New Index is described in section three.Finally, section four summarizes the main conclusions and suggestions for further research.

Climate Index
The weather conditions are roughly random and regarding the vineyard farms there is not much to do to avoid extremal conditions, apart from irrigated farms that can overcome dry and low precipitation periods.The effects of the weather conditions, particularly the temperature and the precipitation have been the subject of many studies [8,[17][18][19] due to their strong link with good grape quality and the following premium wines.With such randomness brought into the wine sector and overall agricultural production there's also a substantial research on risk protection using financial assets such as weather derivatives as stated by Zara [20] or Leggio [21].Those studies also consider weather indexes to determine the asset price and the possible hedging strategies.
The great majority of the bioclimatic indexes that are typically used to evaluate the wine production calculate the accumulated heat over the growing season (often since October 1 st until April 30 th ).This growing season index originally stated on Winkler et al. [16] is usually called Growing Degree Days (GDD) or Winkler Index (WI) and it is used to describe a general climate and determine if it is suitable for wine production.Another remark about the WI is that the author defines five possible temperature outcomes (1-very cold until 5-very hot).The Huglin Index (HI), in Huglin [22], follows the same premise of WI, although, it differs in some features since it uses simultaneously the mean and maximum temperatures in the daytime temperature estimation.It is also calculated using a shorter 6-month period rather than seven.Furthermore, the HI incorporates a "length of the day" coefficient into their calculations which is self-explanatory.The research by Stock et al. [23] showed that it is expected an overall northern hemisphere increase of 100 to 600 units in the HI until 2050.That fact may indicate a latitudinal shift of the grapevine cultivation in Europe, with new areas on the northern territories becoming more suitable to wine production opposed to the southern ones which may become inadequate due to the excessive heat.Jones et al. [18] demonstrated that the simple average growing season temperatures on WI and HI are associated to quality vintages.The biologically effective degree-day index (BEDD) created by Gladstones [9] is also a heat sum index.One difference between HI and BEDD is the way that the daytime temperature is estimated.The BEDD index incorporates a new feature which is calculated based on the diurnal temperature range.This index is considered overly adjusted if the diurnal temperature range is greater than 13° C, and it is under adjusted when the temperature range is less than 10° C. Other available indexes compile more than the cumulative temperature such as Branas et al. [24] which suggested an index that compiles the daily average temperature multiplied by the daily precipitation in order to estimate the risk of the downy mildew disease appearance.Ribéreau-Gayon et al. [15] created Ribéreau-Gayon and Peynaud bifactorial hydrothermal scale also known as the RGP index (RGP), even though it presents the similar cumulative temperature approach as its predecessors, the RGP also considers the daily precipitation and shifts the index cycle, starting at 1 st April until 30 th October.Fregoni [25] actually found a meaningful relationship between the RGP index and the Pinot Noir grape production.Similarly, Zara [20]  Even though many features can be considered and included when the climate suitability is evaluated, the temperature grabs the first place as the most relevant and unavoidable (at least on viticulture).

A new climate index
In this section we will purpose a new index (NI), but in order to furtherly compare it to the available previous studies, we describe the aforementioned indexes GDD (WI) and RGP.These two indexes were selected since they use the temperature value, which is the core variable of the model of this work.
The specific formulation of GDD and RGP lays on the expression ( 1) and ( 2) respectively. ( ( where n is equal to 214, corresponding to the number of days between April 1 st and October 31 th , and m is equal to 212 or 213, corresponding to the days between 1 October and 30 April, for each year.Each index i corresponds to a single day, and is the average daily temperature recorded during i th day; is the amount of rainfall in the i th day. The RGP and GDD have quite similar expressions, in fact the RGP only shifts the analysed timeline and adds up the precipitation variable.One feature that immediately comes upfront is their lethargic nature since they both have a fixed reference temperature (10º C).According to Nemani et al. [27] the ideal temperature for grape growing changes across the whole production process, therefore, settling an average temperature (RGP and GDD) may generalize the calculations too much.The main goal of this work is to find a more suitable way to consider the weather effects on the wine production and overcome some flaws pinpointed on the featured two indexes.The NI is suggested based on the descriptive statistics of the data, such as Kurtosis and Skewness which measures the tailedness and asymmetry of the distribution [29].
(3) (4) For univariate data the Kurtosis and Skewness values are given by equations ( 3) and ( 4) respectively, where represents the standard deviation of the data, while represents its mean value.Apart from the newly introduced parameters we also adopt a different timeline suggested by Gladstones [9] and Jones [10] in order to better define the vineyard lifecycle.Therefore, we consider three different periods of three months each (equation 5). (5) Where j = 1 corresponds to the first time interval with days, the period between the 1 st of January and 30 th of March; j = 2 corresponds to the second time interval with days, the period between the 1 st of April and 30 th of June and, finally, j = 3 with days, corresponds to the period between the1 st of July and 30 th of September.The parameter is the average temperature for each period j and the parameters are, respectively, the optimal value of kurtosis, skewness and mean temperature for the correspondent period, which will be explained in the following.
The suggested index tries to mimic the descriptive statistical values of a target distribution.Therefore, to calculate these "optimal" values we fitted a polynomial single regression (equation 6). ( The dependant variable Z corresponds to the production values (in litres) and the single explanatory variable X is the kurtosis, skewness, or the mean temperature values for each period j (j = 1,2,3).Considering 67 values for variable Z of Minho's green wine production from 1950 until 2016, collected from CVRVV [30,31], nine different polynomial single regression were performed to obtain the three optimal parameters for each period and each variable.We gathered the daily averaged temperature data and precipitation values from the E-OBS observational interpolated/gridded dataset, version 15 [32].Despite a few limitations explained at Hofstra et al. [33], this dataset provides uninterrupted and homogeneous gridded fields of daily average temperatures and daily cumulative precipitation over Considering those 16 values we calculate their total average which in the end samples a good approximation to the Minho's overall weather conditions.First, we test our available data in the previously mentioned GDD (WI) and RGP indexes.
There are some interpretation difficulties associated with the polynomial regression.These problems can be surpassed performing algebraic manipulations, according to Stimson [34], and equation ( 6) can be reparametrized as equation ( 7): (7) where M is the minimum/maximum value of Z (equation 8) while F is the minimizing (for convex, ) or the maximizing (for concave, ) value of Z (equation 9).( 8) (9) Table 1 displays the nine Polynomial regression results and the target F calculated as Stimson [34].Each regression was performed individually settling the dependant variable as the yearly production upon of kurtosis , skewness and mean temperature for each period j = 1,2,3 corresponding to each previously mentioned trimester.The kurtosis values that maximize the output tend to be higher than three (heavier tails than the normal distribution) while skewness values should linger around zero (similar to the normal distribution).There is not much to say about since they represent the target mean temperature values, although the increasing temperature displays the typical differences between Winter, Spring and Summer respectively.That is actually another concern, since the negative regression coefficient ( ) illustrate that F accounts the value where the production is maximized.On the other hand, that doesn't apply to the skewness values where , since F retrieves a minimum for and .Given that, we need to accommodate equation ( 5) for each situation and also ensure that the three variables contribute equally to the final result.To overcome that problem, we normalized each variable within [0,1] settling 1 as the major penalty to the overall result (Table 3) while 0 represents the optimal target.Since the parameter F absorbs different meanings we need to settle a set of rules for each F representation (while x denotes a single observation), presented in Table 2.

Table 2. Different boundaries per each polynomial regression result
Where: (10) After gathering information about the F representation, we settle the normalization process accordingly (Table 3).

Table 3. Normalization scenarios per each F formulation
Normalization Formula Scenario A Scenario B The equation ( 5) only gives us the general sum and dispersion guideline when F represents a maximum.But technically each F can be a maximum (A) or a minimum (B) suggested value to place on equation ( 5), therefore the input calculation varies accordingly.Substituting each optimal and the temperature values on equation (5) we gather a set of raw values that don't acquaint their F as a maximum/minimum To test our index, we performed an Ordinary Least Squares such as Fregoni [25] and Cossu et al. [35] opposed to other indexes already presented during this work (RGP, GDD).Equation (11) relates each index value (D) with the production (Z). ( The Table 4 displays the OLS regression output coefficients for the three different indexes considering the Minho region production values as the dependent variable.The first remark that we've notice is the substantially low R-Squared values, that fact may be explained by the omission of many other explanatory variables that actually also affect the production output, such as the harvested land, the economic policies and market environment.Nonetheless, our suggested model manages to achieve a higher R-Squared upon the other two indexes.All three indexes reveal a negative coefficient indicating that increments on each index value may result in a decrease on the output variable (Z) production as expected (since higher index values logically represent worse weather conditions).The straightforward interpretation of our suggested index (NI) is that the more the distributional weather values drift away from the settled as optimal, the smaller the production values should be.Other advantage of the NI against the other common indexes is that it doesn't settle arbitrary bounds since it is constructed to evaluate a specific production environment.Therefore, the optimal values should vary upon different regions and adapt to different environment in order to become a versatile and suitable index.A well calibrated index may play an important role in production hedging calculations resembling Zara [20] or efficiency related works where the climate is quite responsible for apparent outliers and general data noise.Nonetheless, we present a quite complex formulation reinforced by the necessary set of rules upon the F values.

Conclusions
This empirical work tries to understand the role of the climate effects upon the wine industry production.After considering the available bibliography regarding the environment effect measures (climate indexes) we noticed that even though each existing index underpins and justifies its features in solid theoretical background, all of them used pre-formulated values that generalizes the index application apart from the production geographic position.Therefore, our main goal was to provide an index that can be adjusted individually per each wine production that we are actually studying.Our suggested index assembles descriptive statistics features such as the temperature mean, the kurtosis and skewness with optimal values calculated according to Stimson [34] polynomial regression approach.
Regarding the Minho region and our model formulations, generally heavy-tailed and non-skewed (roughly normal) temperature distributions during the three period subsets are suggested in order to maximize target region output.In order to evaluate our index performance, we made an OLS regression such as Fregoni [25], Zara [20] and Cossu [35] to study if the suggested index actually does have better explanatory capabilities than the others.Using Minho's region wine production dataset our proposed index reveals himself statistically significant and also outperforms GDD and RGP, considering explanatory power parameters.For further development it is suggested to add more environmental parameters such as the precipitation to the model in order to increase his accuracy, the NI also computes the observation dispersion linearly on the objective function (OB), so, it might be interesting to test non-linear OB formulations.All inputs (temperature, kurtosis and skewness) were weighted equally, changing the formulation weights may also display important and relevant information since we might overvaluing/undervaluing the parameters.Such study may zoom even further geographic characteristics and find the core parameter for each specific farm.It would be also interesting to see a further NI application on vineyard hedging problems or efficiency-based studies.

Table 4 .
OLS regression output for three different indexes