pygcam.diff

Functions for computing differences between CSV files and for generating CSV and XLSX from multiple CSV files.

API

See the https://opensource.org/licenses/MIT for license details.

pygcam.diff.computeDifference(df1, df2, resetIndex=True, dropna=True, asPercentChange=False, splitLand=False)

Compute the difference between two DataFrames.

Parameters:
  • df1 – a pandas DataFrame instance
  • df2 – a pandas DataFrame instance
  • resetIndex – (bool) if True (the default), the index in the DataFrame holding the computed difference is reset so that data in non-year columns appear in individual columns. Otherwise, the index in the returned DataFrame is based on all non-year columns.
  • dropna – (bool) if True, drop rows with NaN values after computing difference
  • asPercentChange – (bool) if True, compute percent change rather than difference.
  • splitLand – (bool) whether to split ‘Landleaf’ column (if present) to create two new columns, ‘land_use’ and ‘basin’. Ignored if resetIndex is False.
Returns:

a pandas DataFrame with the difference in all the year columns, computed as (df2 - df1) if asPercentChange is False, otherwise as (df2 - df1)/df1.

pygcam.diff.diffCsvPathname(query, baseline, policy, diffsDir=None, workingDir='.', createDir=False, asPercentChange=False)

Compute the path to the CSV file containing differences between policy and baseline scenarios for query.

Parameters:
  • query – (str) the base file name of the query result
  • baseline – (str) the baseline scenario
  • policy – (str) the policy scenario
  • workingDir – (str) the directory immediately above the baseline and policy sandboxes.
  • createDir – (bool) whether to create the diffs directory, if needed.
Returns:

(str) the pathname of the CSV file

pygcam.diff.queryCsvPathname(query, scenario, workingDir='.')

Compute the path to the CSV file containing results for the given query and scenario.

Parameters:
  • query – (str) the base file name of the query result
  • scenario – (str) the scenario name
  • workingDir – (str) the directory immediately above the baseline and policy sandboxes.
Returns:

(str) the pathname of the CSV file

pygcam.diff.writeDiffsToCSV(outFile, referenceFile, otherFiles, skiprows=1, interpolate=False, years=None, startYear=0, asPercentChange=False, splitLand=False)

Compute the differences between the data in a reference .CSV file and one or more other .CSV files as (other - reference), optionally interpolating annual values between timesteps, storing the results in a single .CSV file. See also writeDiffsToXLSX() and writeDiffsToFile()

Parameters:
  • outFile – (str) the name of the .CSV file to create
  • referenceFile – (str) the name of a .CSV file containing reference results
  • otherFiles – (list of str) the names of other .CSV file for which to compute differences.
  • skiprows – (int) should be 1 for GCAM files, to skip header info before column names
  • interpolate – (bool) if True, linearly interpolate annual values between timesteps in all data files and compute the differences for all resulting years.
  • years – (iterable of 2 values coercible to int) the range of years to include in results.
  • startYear – (int) the year at which to begin interpolation, if interpolate is True. Defaults to the first year in years.
  • asPercentChange – (bool) if True, compute percent change rather than difference.
Returns:

none

pygcam.diff.writeDiffsToFile(outFile, referenceFile, otherFiles, ext='csv', skiprows=1, interpolate=False, years=None, startYear=0, asPercentChange=False, splitLand=False)

Compute the differences between the data in a reference .CSV file and one or more other .CSV files as (other - reference), optionally interpolating annual values between timesteps, storing the results in a single .CSV or .XLSX file. See writeDiffsToCSV() and writeDiffsToXLSX() for more details.

Parameters:
  • outFile – (str) the name of the file to create
  • referenceFile – (str) the name of a .CSV file containing reference results
  • otherFiles – (list of str) the names of other .CSV file for which to compute differences.
  • ext – (str) if ‘.csv’, results are written to a single .CSV file, otherwise, they are written to an .XLSX file.
  • skiprows – (int) should be 1 for GCAM files, to skip header info before column names
  • interpolate – (bool) if True, linearly interpolate annual values between timesteps in all data files and compute the differences for all resulting years.
  • years – (iterable of 2 values coercible to int) the range of years to include in results.
  • startYear – (int) the year at which to begin interpolation, if interpolate is True. Defaults to the first year in years.
  • asPercentChange – (bool) whether to write diffs as percent change from baseline
  • splitLand – (bool) whether to split ‘Landleaf’ column (if present) to create two new columns, ‘land_use’ and ‘basin’.
Returns:

none

pygcam.diff.writeDiffsToXLSX(outFile, referenceFile, otherFiles, skiprows=1, interpolate=False, years=None, startYear=0, asPercentChange=False, splitLand=False)

Compute the differences between the data in a reference .CSV file and one or more other .CSV files as (other - reference), optionally interpolating annual values between timesteps, storing the results in a single .XLSX file with each difference matrix on a separate worksheet, and with an index worksheet with links to the other worksheets. See also writeDiffsToCSV() and writeDiffsToFile().

Parameters:
  • outFile – (str) the name of the .XLSX file to create
  • referenceFile – (str) the name of a .CSV file containing reference results
  • otherFiles – (list of str) the names of other .CSV file for which to compute differences.
  • skiprows – (int) should be 1 for GCAM files, to skip header info before column names
  • interpolate – (bool) if True, linearly interpolate annual values between timesteps in all data files and compute the differences for all resulting years.
  • years – (iterable of 2 values coercible to int) the range of years to include in results.
  • startYear – (int) the year at which to begin interpolation, if interpolate is True. Defaults to the first year in years.
  • asPercentChange – (bool) if True, compute percent change rather than difference.
Returns:

none