pygcam.diff¶
Functions for computing differences between CSV files and for generating CSV and XLSX from multiple CSV files.
API¶
See the https://opensource.org/licenses/MIT for license details.
- pygcam.diff.computeDifference(df1, df2, resetIndex=True, dropna=True, asPercentChange=False, splitLand=False)¶
Compute the difference between two DataFrames.
- Parameters
df1 – a pandas DataFrame instance
df2 – a pandas DataFrame instance
resetIndex – (bool) if True (the default), the index in the DataFrame holding the computed difference is reset so that data in non-year columns appear in individual columns. Otherwise, the index in the returned DataFrame is based on all non-year columns.
dropna – (bool) if True, drop rows with NaN values after computing difference
asPercentChange – (bool) if True, compute percent change rather than difference.
splitLand – (bool) whether to split ‘Landleaf’ column (if present) to create two new columns, ‘land_use’ and ‘basin’. Ignored if resetIndex is False.
- Returns
a pandas DataFrame with the difference in all the year columns, computed as (df2 - df1) if asPercentChange is False, otherwise as (df2 - df1)/df1.
- pygcam.diff.diffCsvPathname(query, baseline, policy, diffsDir=None, workingDir='.', createDir=False, asPercentChange=False)¶
Compute the path to the CSV file containing differences between policy and baseline scenarios for query.
- Parameters
query – (str) the base file name of the query result
baseline – (str) the baseline scenario
policy – (str) the policy scenario
workingDir – (str) the directory immediately above the baseline and policy sandboxes.
createDir – (bool) whether to create the diffs directory, if needed.
- Returns
(str) the pathname of the CSV file
- pygcam.diff.queryCsvPathname(query, scenario, workingDir='.')¶
Compute the path to the CSV file containing results for the given query and scenario.
- Parameters
query – (str) the base file name of the query result
scenario – (str) the scenario name
workingDir – (str) the directory immediately above the baseline and policy sandboxes.
- Returns
(str) the pathname of the CSV file
- pygcam.diff.writeDiffsToCSV(outFile, referenceFile, otherFiles, skiprows=1, interpolate=False, years=None, startYear=0, asPercentChange=False, splitLand=False)¶
Compute the differences between the data in a reference .CSV file and one or more other .CSV files as (other - reference), optionally interpolating annual values between timesteps, storing the results in a single .CSV file. See also
writeDiffsToXLSX()andwriteDiffsToFile()- Parameters
outFile – (str) the name of the .CSV file to create
referenceFile – (str) the name of a .CSV file containing reference results
otherFiles – (list of str) the names of other .CSV file for which to compute differences.
skiprows – (int) should be 1 for GCAM files, to skip header info before column names
interpolate – (bool) if True, linearly interpolate annual values between timesteps in all data files and compute the differences for all resulting years.
years – (iterable of 2 values coercible to int) the range of years to include in results.
startYear – (int) the year at which to begin interpolation, if interpolate is True. Defaults to the first year in years.
asPercentChange – (bool) if True, compute percent change rather than difference.
- Returns
none
- pygcam.diff.writeDiffsToFile(outFile, referenceFile, otherFiles, ext='csv', skiprows=1, interpolate=False, years=None, startYear=0, asPercentChange=False, splitLand=False)¶
Compute the differences between the data in a reference .CSV file and one or more other .CSV files as (other - reference), optionally interpolating annual values between timesteps, storing the results in a single .CSV or .XLSX file. See
writeDiffsToCSV()andwriteDiffsToXLSX()for more details.- Parameters
outFile – (str) the name of the file to create
referenceFile – (str) the name of a .CSV file containing reference results
otherFiles – (list of str) the names of other .CSV file for which to compute differences.
ext – (str) if ‘.csv’, results are written to a single .CSV file, otherwise, they are written to an .XLSX file.
skiprows – (int) should be 1 for GCAM files, to skip header info before column names
interpolate – (bool) if True, linearly interpolate annual values between timesteps in all data files and compute the differences for all resulting years.
years – (iterable of 2 values coercible to int) the range of years to include in results.
startYear – (int) the year at which to begin interpolation, if interpolate is True. Defaults to the first year in years.
asPercentChange – (bool) whether to write diffs as percent change from baseline
splitLand – (bool) whether to split ‘Landleaf’ column (if present) to create two new columns, ‘land_use’ and ‘basin’.
- Returns
none
- pygcam.diff.writeDiffsToXLSX(outFile, referenceFile, otherFiles, skiprows=1, interpolate=False, years=None, startYear=0, asPercentChange=False, splitLand=False)¶
Compute the differences between the data in a reference .CSV file and one or more other .CSV files as (other - reference), optionally interpolating annual values between timesteps, storing the results in a single .XLSX file with each difference matrix on a separate worksheet, and with an index worksheet with links to the other worksheets. See also
writeDiffsToCSV()andwriteDiffsToFile().- Parameters
outFile – (str) the name of the .XLSX file to create
referenceFile – (str) the name of a .CSV file containing reference results
otherFiles – (list of str) the names of other .CSV file for which to compute differences.
skiprows – (int) should be 1 for GCAM files, to skip header info before column names
interpolate – (bool) if True, linearly interpolate annual values between timesteps in all data files and compute the differences for all resulting years.
years – (iterable of 2 values coercible to int) the range of years to include in results.
startYear – (int) the year at which to begin interpolation, if interpolate is True. Defaults to the first year in years.
asPercentChange – (bool) if True, compute percent change rather than difference.
- Returns
none