regression.py

lib.regression.check_regression_model(paths, tech)

This function checks the regression model parameters for nan values, and returns the FLH and TS model dataframes. If missing values are present in the input CSV files, the users are prompted if they wish to continue or can modify the corresponding files.

Parameters
  • paths (dict) – Dictionary of dictionaries containing the paths to the FLH and TS model regression CSV files.

  • tech (str) – Technology under study.

Return (FLH, TS_reg)

Tuple of pandas dataframes for FLH and TS.

Return type

Tuple of pandas dataframes

lib.regression.clean_FLH_regression(paths, param)

This function creates a CSV file containing the model FLH used for regression. If the region is present in the IRENA database, then the FLH are extracted directly from there. In case it is not present, a place holder for the regions is written in the csv file and it is the user’s responsibility to fill in an appropriate value. The function will warn the user, and print all regions that are left blank.

Parameters
  • param (dict) – Dictionary of dictionaries containing the list of regions.

  • paths (dict) – Dictionary of dictionaries containing the paths to IRENA_summary, IRENA_dict.

Return missing

List of string of the missing regions. The CSV file for the the FLH needed for the regression is saved directly in the given path, along with the corresponding metadata in a JSON file.

Return type

list of str

Raises

Missing Regions – No FLH values exist for certain regions.

lib.regression.clean_TS_regression(paths, param, tech)

This function creates a CSV file containing the model time series used for regression. If the region is present in the EMHIRES text files then the TS is extracted directly from it. If the region is not present in the EMHIRES text files, the highest FLH generated TS is used instead and is scaled to match IRENA FLH if the IRENA FLH are available.

Parameters
  • paths (dict) – Dictionary containing paths to EMHIRES text files.

  • param (dict) – Dictionary containing the FLH_regression dataframe, list of subregions contained in shapefile, and year.

Returns

The time series used for the regression are saved directly in the given path, along with the corresponding metadata in a JSON file.

Return type

None

Raises
  • Missing FLH – FLH values are missing for at least one region. No scaling is applied to the time series for those regions.

  • Missing EMHIRES – EMHIRES database is missing, generated timeseries will be used as model for all regions.

lib.regression.combinations_for_regression(paths, param, tech)

This function reads the list of generated time series for different hub heights and orientations, compares it to the user-defined combinations and returns a list of lists containing all the available combinations. The function will return a warning if the user input and the available time series are not congruent.

Parameters
  • paths (dict) – Dictionary of dictionaries containing the paths to the regional analysis output folder.

  • param (dict) – Dictionary of dictionaries containing the subregions name, year, and user-defined combinations.

  • tech (str) – Technology under study.

Return combinations

List of combinations for regression.

Return type

list

Raises
  • missing data – If no time series are available for this technology, a warning is raised.

  • missing combination – If a hub height or orientation is missing based on user-defined combinations, a warning is raised.

lib.regression.get_regression_coefficients(paths, param, tech)

This function solves the following optimization problem: A combination of quantiles, hub heights or orientations is to be found, so that the error to a given historical time series (e.g. from EMHIRES for European countries) is minimized, while constraining the FLH to match a given value (for example from IRENA). The settings of the combinations can be defined by the user.

The function starts by identifying the existing settings (hub heights, orientations) and quantiles. If the combinations of time series requested by the user cannot be found, a warning is raised.

It later runs the optimization and identifies the subregions for which a solution was found. If the optimization is infeasible (too high or too low FLH values compared to the reference to be matched), the time series with the closest FLH to the reference value is used in the final output.

The output consists of coefficients between 0 and 1 that could be multiplied later with the individual time series in time_series.generate_stratified_timeseries. The sum of the coefficients for each combination is equal to 1.

Parameters
  • paths (dict) – Dictionary including the paths to the time series for each subregion, technology setting, and quantile, to the output paths for the coefficients.

  • param (dict) – Dictionary including the dictionary of regression parameters, quantiles, and year.

  • tech (str) – Technology under study.

Returns

The regression parameters (e.g. IRENA FLH and EMHIRES TS) are copied under regression_in folder, and the regression coefficients are saved in a CSV file under regression_out folder, along with the metadata in a JSON file.

Return type

None

Raises
  • Missing Data – No time series present for technology tech.

  • Missing Data for Setting – Missing time series for desired settings (hub heights / orientations).

lib.regression.pyomo_regression_model()

This function returns an abstract pyomo model of a constrained least square problem for time series fitting to match model FLHs and minimize difference error with model time series.

Return model

Abstract pyomo model.

Return type

pyomo object

lib.regression.read_generated_TS(paths, param, tech, settings, subregion)

This function returns a dictionary containing the available time series generated by the script based on the desired technology and settings.

Parameters
  • paths (dict) – Dictionary including output folder for regional analysis.

  • param (dict) – Dictionary including list of subregions and year.

  • tech (str) – Technology under study.

  • settings – List of lists containing setting combinations.

  • subregion (str) – Name of the subregion.

Return GenTS

Dictionary of time series indexed by setting and quantile.

Return type

dict

lib.regression.regmodel_load_data(paths, param, tech, settings, subregion)

This function returns a dictionary used to initialize a pyomo abstract model for the regression analysis of each region.

Parameters
  • paths (dict) – Dictionary of dictionaries containing the paths to the CSV time series files.

  • param (dict) – Dictionary of dictionaries contating IRENA’s region list, FLHs and EMHIRES model timeseries.

  • tech (str) – Technology under study.

  • settings (list) – List of all the settings (hub heights/orientations) to be used in the regression.

  • subregion (str) – Name of subregion.

Return data

Dictionary containing regression parameters.

Return type

dict