RoCliB– bias‐corrected CORDEX RCMdataset over Romania

Four climate parameters (i.e. maximum, mean and minimum air temperature and precipitation amount) from 10 regional climate models, provided by the EURO‐CORDEX initiative, are adjusted using as reference the ROCADA gridded dataset. The adjustment was performed on a daily temporal resolution for the historical period (1971–2005), as well as for climate change scenarios based on two Representative Concentration Pathways (RCP4.5 and RCP8.5). The most accurately method for bias‐correction (BC) was selected following a 2‐fold cross‐validation approach, which was performed on historical data using two methods: Quantile Mapping (QMAP) and Multivariate Bias Correction with N‐dimensional probability (MBCn). The performances of the two methods are very similar when analysing the frequency distribution of each selected variable, whereas the comparison between (1) the intervariables correlation of the adjusted datasets, and (2) the reference dataset revealed much smaller differences for the dataset adjusted with the multivariate method, hence this was used for producing the BC scenario dataset. Based on the MBCn adjusted dataset, a climate change analysis over Romania was tested at the seasonal and annual scale. Overall, for the multimodel ensemble mean, at the country level, a substantial temperature increase is reported for both scenarios and no significant trend is revealed for precipitation amount. The adjusted RCMs are provided without any restrictions via an open‐access repository in netCDF CF‐1.4‐compliant file format. The BC climate models are archived at the 0.1° spatial resolution, in the WGS‐84 coordinate system, at a daily temporal resolution.


| INTRODUCTION
Regional Climate Models (RCMs) outputs are the primary sets of data used in regional and national climate change assessment (Giorgi, 2019). However, even if they are spatially and physically coherent with observations, the RCMs outputs still contain biases, inherited from the forcing of GCMs or due to systematic model error (Chen et al., 2013;Navarro-Racines et al., 2020;Tong et al., 2020;Turco et al., 2017;Pradeebane et al., 2021).
While biases are associated with both GCMs and RCMs, numerous studies have proved that the latter ones incline to perform better at the regional scale, such as RCM-simulated precipitation (Maraun et al., 2010;Teutschbein and Seibert, 2012). Most often, RCMs overestimates the precipitation amounts, but usually at weaker intensity, with significant biases over complex topography, even though they are carried out with a higher resolution than the GCMs (Haslinger et al., 2013;Maraun and Widmann, 2015;Warrach-Sagi et al., 2013). As for the air temperature, the climate simulations correlate better with observations, and they are able to reproduce to some extent the temperature gradients related to altitude and coastal area, although some biases were also reported Montesarchio et al., 2014;Torma et al., 2020).
In order to overcome the limitations identified in simulations provided by RCMs, a bias correction (BC) using as reference observed data (usually gridded datasets) is applied before the use of climate scenario data in impact studies (Stocker et al., 2015). Adjusted RCMs data were already made available either at the continental (Chakraborty et al., 2020;Dosio et al., 2012) or national/regional scale (Kotlarski et al., 2017;Mezghani et al., 2017;Wong et al., 2016). The quality of the biased corrected datasets is strongly related with the quality of the reference datasets; usually, the national gridded datasets are more accurate as they incorporate information from a higher number of stations and more homogenous networks when compared to the available datasets at the continental scale (Dumitrescu et al., 2016;Herrera et al., 2019).
The BC methods can be grouped in two main categories: univariate (UBC) and multivariate (MBC). The UBC methods perform the adjustments independently, based on a single element, whereas the MBCs correct two or more elements at the same time (Cannon, 2018, Cannon, 2016Cannon et al., 2015), which creates the premises of preserving the climate variables' interdependence (Hanggoro et al., 2020). An intercomparison study of MBCs applied on air temperature and precipitation data is performed by (François et al., 2020). They concluded that MBCs methods are designed to correct the univariate distribution properties as well as the spatial, temporal and intervariable dependence structures. The benefit of using MBCs was demonstrated in correcting the daily temperature and precipitation climate model outputs (Vrac and Friederichs, 2015;Adeyeri et al., 2020), but also by assessing the impact of using MBC scenario data in hydrological modelling Su et al., 2020).
The main purpose of this paper is to produce an adjusted dataset of RCM-derived air temperature and precipitation over Romania, which can become a national reference in climate change impact assessment studies. In this regard, two BC methods of the climate modes data were assessed: (1) the univariate quantile mapping method (QMAP) and (2) the multivariate bias correction by Ndimensional probability density (MBCn). Despite several studies having already investigated the performance of BC over the study area (Torma et al., 2020), to the best of our knowledge this is the first study, which examines the intercomparatively analyses of two BC methods that have been applied on RCMs data for the entire territory of Romania, using as reference the state-of-the art national gridded dataset (Dumitrescu and Birsan, 2015). Our study contributes to better understanding the benefits of using the BC methods for supporting reliable scenarios of climate change impact applications, and aimed to provide a reliable RCM dataset for the territory of Romania.
In the next two sections we present the datasets selected for the analysis and the statistical methods used to perform the adjustment. In the subsequent sections, two BC methods were tested by splitting the historical dataset into training and testing subsets, and some examples of using the MBCn adjusted dataset for performing the climate change analysis are shown. The key findings are reviewed in the final section of the study.

| Reference dataset
The calibration and testing of the two methods were performed using as reference the daily temperature, that is average (tas), minimum (tasmin) and maximum (tasmax), and precipitation (pr) for the period 1971-2005, extracted from the ROCADA gridded dataset (Dumitrescu and Birsan, 2015). ROCADA is based on temperature data from 150 stations and precipitation from 188 stations with at least 30% completeness of records over 1961-2013 used for building a 0.1° lonlat-resolution grid covering the entire national territory of Romania ( Figure 1). The gridded dataset was obtained in two steps. Firstly, a homogenization procedure was applied using the MASH (Multiple Analysis of Series for Homogenization) technique; F I G U R E 1 Spatial distribution of the weather stations issued in the gridded interpolation of ROCADA T A B L E 1 Regional climate models selected in the study

No.
Institution or working group RCM model GCM institute GCM driving Run

| CORDEXsimulations
The work is based on daily climate projections over the 21st century (IPCC, 2013) at 0.11° spatial resolution (about 12.5 km × 12.5 km), originating from the EURO-CORDEX initiative (Jacob et al., 2014). Two Representative Concentration Pathways (RCPs) climate projections were selected: a moderate (radiative forcing to stabilize at 4.5 W/m 2 before the year 2100) and a higher one (radiative forcing to stabilize at 8.5 W/m 2 before the year 2100). This work is based on data from 10 RCM simulations nested into different Global Circulation Models (GCMs) ( Table 1), used to extract the daily mean, minimum and maximum air temperature and precipitation for the historical and simulation periods (1971-2100), over the same spatial extension as the reference gridded dataset: from 20.1 to 29.8°E and 43.5 to 48.4°N. The RCMs were remapped from the rotated-pole coordinates to the same spatial projection (regular lat/lon grid) and resolution as the reference gridded data using the bilinear interpolation method, as implemented in the CDO application (Schulzweida, 2019).

| Bias correction
Quantile Mapping (QMAP) and Multivariate Bias Correction with N-dimensional probability (MBCn) were tested using historical climate models data and as reference the gridded data extracted from ROCADA. QMAP is an univariate bias correction based on the quantile mapping algorithm (Boé et al., 2007;Gudmundsson et al., 2012). QMAP adjusts the quantiles so that it matches the distribution of the reference dataset but preserves the model-projected changes. This method applies a univariate transfer function independently to each variable, which permits to link the cumulative distribution function of a variable of interest in the model simulations to that of the observed dataset. In the case of precipitation, the method takes into consideration the bias in the wet day frequency (i.e. days with more than 0.1 mm precipitation), by computing the empirical probability of nonzero observations that is subsequently used as a threshold to set the modelled values to zero. Hence the same proportion of rainy-day events is ensured between observations and climate models data.
Multivariate Bias Correction with N-dimensional probability transfers the statistical characteristics of a reference multivariate distribution to the multivariate distribution of climate model variables. First the univariate distributions of climate variables are adjusted using a 1-dimensional method. Secondly, based on an iterative process, the dependence structure is adjusted until convergence is reached between the multivariate distributions of observation and simulations during the calibration period (Cannon, 2018). In this work, we used 30 iterations until the multivariate distributions of the analysed datasets (observation and simulations) matched (Hanggoro et al., 2020). We have used the qmap R library for the QMAP method (Gudmundsson et al., 2012), and the R MBC library for the MBCn method (Cannon, 2016).

| Implementation and performance assessment
The two methods were applied to correct all RCMs outputs iteratively at each grid cell, either for one variable, in the case of QMAP, or for multiple variables (i.e. tas, tasmin, tasmax, pr) in the case of the MBCn method. In order to take into account, the seasonality, the BC procedures were performed for each season separately (Mezghani et al., 2017).
Before performing the bias-correction, we carefully checked the extreme values for all the variables in all the files, and for the precipitation very high values were found when comparing with neighbouring pixels, which is uncommon over the analysed geographical area. To remove the artefacts from the precipitation data, we used as a threshold the maximum 24 h precipitation value measured the meteorological stations in Romania (262.0 mm), which was increased with 40% for the historical period (i.e. the maximum adjustment due to underestimation of precipitation measurements in the cold season in Romania (Cheval et al, 2011), and we added 40% for the future periods, which is the maximum positive change in precipitation observed in the original EURO-CORDEX data. The values thus identified were further verified with Rosner's generalized extreme Studentized deviate outliers test (Rosner, 1983). In the end, only 88 pixels were confirmed as outliers by Rosner's test from the models 1, 2, 3, 4, 7, 8 and 10. Before performing the bias-correction, the confirmed outliers were removed from the original dataset, and new values were estimated using the cubic-spline interpolation method.

| Properties of the distribution of the variables
The statistical properties of the seasonal differences between observation data, on one side, and model data, either raw outputs or bias-corrected data, on the other side, are summarized in Figure 3 for each method and RCM, using the absolute differences for air temperature, and relative differences for precipitation. One can notice that both QMAP and MBCn methods have performed very well, as the differences between raw and BC data are very close to zero. When comparing the performances of the two methods by parameters, the QMAP provides slightly better improvements in temperature differences, whereas when looking at the precipitation differences, one can notice that the MBCn method performs better, with less evident outliers visible in the boxplots. The two BC methods F I G U R E 2 Schematic overview of the adjustment approach F I G U R E 3 Boxplots of mean seasonal differences between observations and modelled data for mean air temperature (tas) and precipitation amount (pr). Results are shown for raw model output data (hist), QMAP and MBCn. The models are ordered according to the column no. from the Table 1. The ensemble mean (ENS) is also provided performed similarly well in terms of relative difference in standard deviations of air temperature, as the shape of the boxes and position of outliers are almost identical, excepting the Models 2 and 3 ( Figure 4). The standard deviations of MBCn precipitations are more similar to the observations, having the median values computed for all seasons closer to zero, and less outliers than the data adjusted with QMAP. The similar graphical representations provided for tasmin and tasmax ( Figures S1, S2), show that both BC methods perform similarly like in the case of tas.

| Intervariable correlation
The intervariable correlation of the selected variables for all grid cells, before and after the correction performed with QMAP and MBCn methods was evaluated using the Spearman's correlation coefficient. This measure of statistical relationship allows the performing of the analysis without making any assumption about the frequency distribution of the physical variables (François et al., 2020). Overall high positive correlation can be noticed between air temperatures, for all the datasets and models, while intercorrelation between precipitation and air temperature is small ( Figure 5). The MBCn intervariable correlations are close to the observed data, in contrast with QMAP for which the computed intervariable correlations are more similar to historical ones (i.e. the raw model simulations). The MBCn corrections retain the intervariable dependencies as represented in the observed datasets, as can be clearly seen in the intercorrelation patterns of all parameters analysed from all the models, with situations when the air temperature correlations values are very close to those computed from the reference datasets.
Intervariable seasonal correlations were also computed for each pixel from the adjusted RCMs data using the two methods. The differences between each BC dataset and the reference of the temperature-precipitation Spearman correlation coefficient computed at each grid cell are shown at seasonal scale for model no 01 in Figure 6. One can clearly observe that the differences are much smaller for the dataset adjusted with MBCn than those resulted from the QMAP method. This remark is available for all the datasets analysed, as the figures from the supporting material ( Figures S3-S11) reveal. Also, no significant spatial pattern can be depicted in the MBCn differences, whereas in the case of QMAP the differences are notable in regions characterized by high topographic variability (mountain areas), especially in spring and fall. The above outcomes are confirmed for all the models, as it can be noticed from the maps shown as ESM.
In summary, both methods adjust very well the climate scenarios data, but considering the good performance in keeping the observed intervariable correlation provided by MBCn and taking into account these data would be used in subsequent analyses (climate change impact studies), it was decided that the MBCn method will be used for producing the final bias-corrected scenario datasets.

F I G U R E 4
Boxplots of relative seasonal differences of standard deviations between observations and modelled data for air temperature (tas) and precipitation amount (pr). Results are shown for model output data (hist), QMAP and MBCn. The models are ordered according to the column no. from the Table 1. The ensemble mean (ENS) is also provided

| Impact on the climate change signal
Several recent studies reported that even if the quality of the simulations was improved by a bias-correction for the historical period, the signal of climate change could be altered by modifying the physical links between variables (Sangelantoni et al., 2019;Willkofer et al., 2018). In order to evaluate if the BC datasets preserve the simulated characteristics for the future period, the changes in temperature and precipitation computed from the raw dataset were compared with those adjusted by the MBCn method. As it is expected that the climate change signal will be more substantially modified at the end of the analysed period (Mezghani et al., 2017), the changes (relative for precipitation and absolute for temperature) were computed for the period 2091-2100 using as reference the 1971-2000 interval. The changes for CCLM4-8-17 driven by the CNRM-CERFACS (Model 1 in Table 1), scenario RCP4.5 (Figures 7, 8) demonstrate that, in general, no significant change of the climate signal was produced by the biascorrection procedure. The patterns of changes are almost identical for air temperature, that is differences below 0.5°C. The differences between changes computed from original and BC data are more pronounced in the case of precipitation, but they exceed 30% only in a few spots. The same outcomes regarding the change signal are obtained for all individual simulations, both scenarios (RCP4.5 and RCP8.5), as one can notice in Figures S12-89.

| Future climate change
The bias-corrected EURO-CORDEX simulations were afterwards used for climate change assessment over Romania. The climate change analysis was performed for all the variables selected, using the control period 1971-2000 interval, at a seasonal and annual scale, for the two selected scenarios (RCP4.5 and RCP8.5), and two timescales (2021-2050 and 2071-2100), in terms of absolute changes for temperature (°C), and relative changes for precipitation (%). A multimodel ensemble approach based on median combination was used, which is more robust to the potential presence of outliers (Samouly et al., 2018;Sillmann et al., 2013). F I G U R E 5 Intervariable Spearman's rank matrix correlation of daily mean air temperature (tas), maximum air temperature (tasmax), minimum air temperature (tasmin) and precipitation (pr); for observation (observed), historical regional models' period (historical), univariate bias correction (QMAP) and multivariate bias correction (MBCn). The models are ordered according to the column no. from the Table 1 The changes in seasonal and annual air temperature are presented in Table 2 and in Figure 9 (only for the 2071-2100 interval). The maps of changes over the same interval for minimum and maximum air temperature are available as Figures ESM S90-91, while the maps of changes over the period 2021-2050 for all variables analysed are available as Figures ESM S92-95. For both scenarios an increase in temperature is depicted regardless of the timescale analysed. For the period 2021-2050, the changes computed for both RCPs are very similar, with a slightly higher increase in the autumn temperature computed for RCP8.5 scenarios for minimum and maximum temperature. As for the period 2071-2100, one can notice that the magnitude of changes for each scenario does not vary a lot between minimum, maximum and mean temperature (Table 2a-2c). As regards the spatial patterns for the 2071-2100 air temperature changes, the eastern and north-eastern part of the country are the areas with F I G U R E 6 Differences of temperature versus precipitation spearman correlation computed at each grid  cell between original simulations and adjusted data by the two methods: MBCn and QMAP. Model no. 1 from Table 1 F I G U R E 7 Seasonal air temperature change (°C) in the RCP4.5 (2091-2100) with regard to the reference period  for raw (upper row) and bias-corrected (lower row). Model no. 1 from Table 1 the largest changes in temperature in Winter (more than 4.5°C for the RCP8.5 scenario), as for the other seasons the southern and eastern parts are expected to be the most affected by the temperature increases, with more than 2°C according to the RCP4.5 scenario, and above 4°C for the RCP8.5 scenario. Regardless of the scenario and timescale, the changes of precipitation computed as the areal mean is positive on annual scale and for three seasons (DJF, MOM and SON) and negative for the Summer. The highest precipitation increases for the period 2071-2100 (more than 30%) is expected to occur in the cold season (DJF) in large areas of the central part of the country, when assuming F I G U R E 8 Seasonal precipitation change signal (%) in the RCP4.5 (2091-2100) with regard to the reference period  for raw (first row) and bias-corrected (second row). Model no.  (Figure 10). Positive signals were also depicted for most of the study area for MAM and SON seasons, as well as for the annual scale, while in the warm season (JJA) the signal of change is mostly negative, with decreases for RCP8.5, which can reach up to 30% in southeastern and south-western areas. An overview of the changes expected in short-(2021-2050) and long-term (2071-2100) versus the reference period , computed from all the models and variables, is shown as boxplots in the Figure 11. The general aspect of the changes as resulted from the ensemble median were also depicted by looking at each model and parameter. The changes in air temperature are positive for all the models, regardless of the period analysed. Like in the ensemble analysis, the positive changes for both RCPs for the period the periods 2021-2050 are around 1°C, while for the second period the changes for the RCP8.5 scenario are for majority of models between 4°C and 5°C, and for RCP4.5 they are between 1°C and 3°C. For precipitation, the signal of change is in general positive, excepting some models, which have returned values close to zero (2021-2050) or even below zero (2071-2100). The patterns of changes are very similar for 2021-2050, whereas for the 2071-2100 period some models tend to provide a slightly more negative signal for the RCP8.5 scenario.
Variability of anomalies in annual mean temperatures and precipitation over Romania are shown for each year from 1971 to 2100 in Figure 12. The values shown on the graph are computed for each year as areal mean, with country-scale coverage. The anomalies were computed as the difference between annual values and multiannual F I G U R E 9 Seasonal and annual mean temperature ensemble change (°C) in the future (2071-2100) with regard to the bias-corrected historical period  F I G U R E 1 0 Seasonal precipitation ensemble change (%) in the future (2071-2100) with regard to the historical period  mean for the interval 1971-2100. The reference datasets used in bias-correction procedures are also presented as barplots in the figures. The differences between the RCP4.5 and RCP8.5 scenarios are evident in the case of air temperature, emphasizing the same magnitude of temperature increase until 2050, and a more rapid increase by the end of the century in case of the RCP8.5 scenario, that is an anomaly greater than 4°C by 2100. No significant trend can be noticed in the case of the annual precipitation anomalies, regardless of the scenario and period F I G U R E 1 1 Changes in annual temperature and precipitation in Romania from all BC models data during the periods and 2021-2050 and 2071-2100 compared with the mean for the period 1971-2000, as returned by the 10 models (01-10) introduced in Table 1. The relative changes of precipitation were divided by 10 in order to be aligned with the air temperature values F I G U R E 1 2 Variability of anomalies in annual mean temperature (°C) and precipitation amounts (%) in Romania during 1971-2100 compared with normal (mean for 1971-2000). The black lines show the ensemble mean of ten BC climate scenarios for the RCP4.5 (top panels) and RCP8.5 (bottom panels) scenario. The grey field shows the range in variation between the highest and lowest value for the members of the ensemble. The bars show historic data from observations used as reference for adjustments (ROCADA dataset) analysed. This is in agreement with previous studies in which no consistent precipitation change signals were depicted for the study area based on a multimodel ensemble approach (Jacob et al., 2014;Strandberg et al., 2015).

CHARACTERISTICS
The adjusted RCMs are provided via the Zenodo 1 openaccess repository, which allows the uploading of files up to 50 GB. The data are provided in a netCDF CF-1.4compliant format using netCDF4 compression. For each variable and scenario distinct files were created: 4 climate variables × (10 Historical + 10 RCP8.5 + 10 RCP4.5) = 120 netCDF files. The terms of use for the bias-corrected data are the same as those from the original EURO-CORDEX simulations obtained from ESGF servers. 2 The main characteristics of the RoCliB dataset are listed below:

| CONCLUSIONS
In this study, daily mean, minimum and maximum air temperature and precipitation outputs from 10 General Circulation Models (GCMs) dynamically downscaled in the EURO-CORDEX initiative by several Regional Climate Models (RCMs) were adjusted over Romania for the period 1971-2100. Two climate change scenarios were selected, namely a moderate (RCP4.5) and a higher emission scenario (RCP8.5). The method applied for bias correction of the RCMs outputs was selected following a performance assessment procedure in which two categories of BC methods were tested, namely QMAP univariate and MBCn multivariate. By analysing the statistical properties of the seasonal differences between the averaged BC RCMs dataset and the observation dataset, it was shown that both BC methods performed very well, with slightly better results of the QMAP for air temperature, and with the MBCn outperforming the univariate method in the case of precipitation. When analysing the relative differences in standard deviations of BC and reference data, the MBCn outputs are closer to the observation dataset.
In terms of the intervariable correlations, based on the Spearman rank coefficient, the multivariate MBCn correlations are very similar with those obtained from the reference data, whereas the QMAP correlations are very well related to those computed from the original RCMs outputs. Taking into consideration that the main use of the adjusted dataset will be in impact studies, the method, which approximates them most accurately the univariate distribution properties, as well as the spatial, temporal and intervariable dependence structures of the reference data was employed to produce the concluding adjusted climate scenario datasets, that is the MBCn method.
The impact of the selected BC method on climate change signal was analysed for the final part of the climate scenarios period (2091-2100), using as reference 1971-2000 period, and it was concluded that the differences between raw and adjusted RCMs are negligible when looking at air temperature, and acceptable for precipitation.
The climate change and trends analysis were performed for all variables using a multimodel ensemble approach based on median combination. The evolutions of the trends are in agreement with previous studies, which reported a constant increase in air temperature by the end of the century for RCP8.5 scenario, while for RCP4.5 data the positive trends can be noticed until the mid of this century. For the precipitation data, no evident trends can be noticed at the country level, regardless the climate scenario analysed.
The availability of a daily BC climate scenario dataset for mean, minimum and maximum air temperature and precipitation (RoCliB) enhance the potential to derive indices of daily climate extremes (e.g. ClimPACT https:// github.com/ARCCS S-extre mes/climp act/) that can be used to assess the climate change impact for different sectors of local economy. The adjusted RCMs data are provided without any restrictions via the Zenodo open-access repository (https://doi.org/10.5281/zenodo.6336837).