What is pygcam?¶

The pygcam package comprises a set of Python modules and a main driver script designed to facilitate a more efficient workflow using Global Change Assessment Model (GCAM).

The tools are intended to meet the needs of different types of users, from basic users who just want to run the model, to “power” users interested in writing custom scripts, to software developers wanting to write new tools like graphical user interfaces for working with GCAM.

The main components include:

Software libraries that simplify development of higher-level software tools (graphical interfaces, scripts) that interface with GCAM. The library will provide an Application Programming Interface (API) to the GCAM input and output data, and to running GCAM, querying results, and performing common processing tasks such as computing differences between policy and baseline scenarios and plotting results.

Command-line tools built upon the library described above to package commonly required functionality into a convenient form for direct use and to support development of higher-level, custom scripts. (See GCAM tool (gt) for details.)

A Monte Carlo Simulation framework using GCAM on high-performance computers, allowing users to explore uncertainty in model outputs resulting from uncertainty in model inputs, and to characterize the contribution of individual parameters to variance in output metrics.

Graphical User Interfaces that simplify use of the libraries and tools as well as providing unique capabilities such as graphical exploration and comparison of sets of Monte Carlo simulation results. (See Graphical User Interface and MCS Explorer for more info.)

Cross-platform capability on Windows, Mac OS X, and Linux.

Installer scripts to simplify installation of tools on users’ computers.

User documentation for all of the above. (This website!)

Users can control many aspects of pygcam through a configuration file found in ${HOME}/.pygcam.cfg. When GCAM tool (gt) is run the first time, a configuration file is created, with all configuration options commented out and showing their default values.

The main script (GCAM tool (gt)) implements several “subcommands” that perform various steps in a typical GCAM analysis. The script implements a plug-in architecture allowing users to customize gt and avoid a proliferation of scripts. The available subcommands for general use include:

chart generates bar or line graphs, with a large number of command-line arguments to customize chart appearance.

config displays the values of configuration parameters and edits the user’s configuration file.

diff computes the differences between two sets of GCAM results found stored in .CSV files in the format generated by the query subcommand.

gcam runs the GCAM model in any indicated directory. Links to a reference GCAM directory are used when possible to avoid needless file copying.

gui runs a local web server that provides a browser-based graphical user interface (GUI) at the address http://127.0.0.1:8050.

init initializes key configuration variables in the ${HOME}/.pygcam.cfg file.

mcs Enable or disable Monte Carlo simulation sub-commands.

mi runs the ModelInterface program based on settings in the user’s config file.

new creates the structure and files required for a new pygcam project.

protect generates XML input files that define custom land-protection scenarios.

query executes named batch queries, and supports regional aggregation described in a region mapping file.

run reads an XML input file and runs one or more steps of an analysis, and these steps typically invoke other sub-commands as required.

sandbox shows, creates, and deletes run-time workspaces used by pygcam.

setup modifies GCAM XML data and configuration files according to user instructions in either XML format or as a Python script.

If the user has enabled Monte Carlo Simulation support (currently available only on Linux clusters) by running gt mcs on, additional MCS-related sub-commands become available.

Also see the GCAM XML-Setup documentation to for information on programmatically modifying copies of GCAM XML files. This “setup” step can be one of the commands called by run.

Guiding Principles¶

The following general principles underlie the design of pygcam:

Common tasks should be easy to accomplish but flexible.

In general, this run-time simplicity requires a bit of setup-time complexity. That is, simplicity at run-time is achieved by relying on the project.xml file, which defines all key aspects of a project. Fortunately, the project file need only be created once.

For example, with a typical project.xml file, a user can setup and run all scenarios for the default project, compute differences between policy and baseline scenarios, run custom computations on results, and generate figures with the simple command gt run. And the user can also identify which projects, scenario groups, scenarios, and/or steps to operate on, as needed.

The user should be able to customize virtually all aspects of the system.

Projects based on GCAM will have a variety of requirements and use patterns that are difficult to anticipate. The Configuration System defines “reasonable” defaults for all parameters, while allowing the user to modify virtually all file and directory locations, command arguments, and other key aspects of the system. There are very few hardcoded aspects to the system.

Projects should be able to be isolated from one another.

By default, pygcam uses symbol link (symlinks) to avoid unnecessary copying sets of large files such as the entire input directory. However, files that are constant across projects in one environment might be changed between projects in another environment. For example, your projects might involve different versions of the GCAM executable, which in most projects (outside of JGCRI) is unchanged across projects. To avoid having changes in shared files inadvertently “pollute” another project, the user can choose which files from the reference workspace (more on this below) to copy and which to link, thereby optimizing the trade-off between complete isolation and avoiding unnecessary copying. (Note that Windows prevents users from creating symlinks by default; pygcam will copy all files on Windows when symlink creation fails.)

Manual editing of XML files should be avoided whenever possible.

Manual modifications to XML files are difficult to document effectively and are error-prone. Generating required files using an XML file or a short Python script based on the pygcam library ensures consistency and serves as complete documentation of changes made to XML files.

Reference GCAM files should not be modified to generate project scenarios.

Reference GCAM files are never modified. Rather, they are copied, as needed, and the copies to the project’s run-time directory and modified there. This allows a set of project files to be shared with others without having to provide a copy of an entire GCAM workspace. The only requirement is that both users start from the same reference system, which for most users will be the latest public release of GCAM.

An additional advantage of this approach is that instructions to generate scenarios should be portable across GCAM versions, provided that the pygcam library is updated to be aware of any relevant changes in the XML format.

Managing Scenarios¶

In GCAM, a scenario is just a name assigned within a configuration file to distinguish runs of GCAM. The scenario name is set in GCAM’s configuration.xml and appears in the upper-left panel of the ModelInterface application.

In pygcam, the scenario concept is made more helpful by implementing a few simple conventions regarding directory structure and filenames. Using a consistent structure simplifies use of the library and tools since more information can be conveyed through the scenario name. The “setup tools” (to be documented) follow these conventions when generating modified XML, allowing the other workflow scripts to find the resulting files.

Scenario conventions¶

We extend the definition of scenario to identify a set of XML files that are used together. In this approach, “scenario” refers to both the name assigned in a configuration.xml file and a corresponding directory holding customized XML files, and a configuration file called config.xml.

GCAM Workspaces¶

The tools are most convenient to use if you follow the file layout created by the “setup tools”. It is not required to use these tools or this file structure, but everything is designed to simplify coordination between the programs. Many of these (absolute and relative) directory locations can be modified to suit your preferences via the pygcam configuration file.

The default file layout is structured to support multiple projects, where each project involves one or more baseline and policy scenarios. These project files can all be stored within a central GCAM work area, or anywhere you prefer.

Project structure¶

One of the goals of the pygcam project system is to distill a minimal set of instructions for creating and running a GCAM analysis. Automating this complex process required developing a consistent structure with computable directory locations. There are three main directories of interest:

Reference workspace

The source of original GCAM files, including XML files, the GCAM program itself, and other ancillary files. The configuration variable GCAM.RefWorkspace identifies this location, which is typically a public GCAM distribution, or a customized version that is the basis for a set of analyses.

Project directory

Where project source files are located. This is identified by the configuration variable GCAM.ProjectDir. By default, the pygcam framework expects certain directories to be located at known relative locations within the project directory, but in most cases, these locations can be adjusted by modifying configuration file parameters.

Sandbox directory

This is a separate, generated workspace, structured like a standard GCAM “Main_User_Workspace” (i.e., with subdirectories “exe”, “input”, “output”, and other required files) in which GCAM is actually run. This location is identified by the configuration variable GCAM.SandboxDir. The sandbox directory is created by copying or linking files from the reference workspace based on the configuration parameters GCAM.WorkspaceFilesToLink and GCAM.WorkspaceFilesToCopy. Modified or generated XML files are also placed in the run directory by the GCAM XML-Setup system.

Project directory¶

The GCAM XML-Setup system provides programmatic methods (i.e., Python functions) that automatic common edits to GCAM XML input and configuration files. The output of the setup system is thus a set of modified XML input and configuration files. These files should not be edited manually as the changes will be overwritten the next time the setup system is run.

The files defining a project are stored in the directory identified by the configuration parameter GCAM.XmlSrc, which defaults to %(GCAM.ProjectDir)s/xmlsrc, i.e., the directory xmlsrc within your project directory. Included under xmlsrc are

Custom XML files

A Python file (by default, scenarios.py) that modifies or creates XML files to generate baseline and policy scenarios. This module is invoked by the setup sub-command in pygcam.

The gcamtool setup sub-command loads the Python file and calls the setup functions corresponding to the requires baseline and policy scenarios. This modifies reference XML files and copies custom XML files to a directory identified by the config parameter GCAM.LocalXml, which default to %(GCAM.ProjectDir)s/local-xml. Dynamically generated constraints (i.e., those that depend on the output of the baseline scenario) are written to the directory indicated by GCAM.DynXml, which defaults to %(GCAM.ProjectDir)s/dyn-xml. See the GCAM XML-Setup page for further details.

N.B. a system for defining projects without writing any Python code is currently in development.

Run-time structure¶

In pygcam, each GCAM scenario is run in a separate copy of the standard GCAM workspace. On Unix-like systems (and on Windows if the user has adequate administrative privileges), the read-only files are symbolically linked to the scenario workspace, avoiding copying of many megabytes of data.

To avoid ambiguity between the reference GCAM workspace (what was previously known as Main_User_Workspace) and the per-scenario, generated workspaces, we refer to the latter as sandboxes, which is a computing term that refers to isolation areas in which programs are run to avoid interactions with other programs.

The default pygcam structure assumes there is a directory under which you want all sandboxes to be created. This is defined by the config parameter GCAM.SandboxRoot, which defaults to %(Home)s/sandbox. You can change GCAM.SandboxRoot to any desired directory. The sandbox for an individual project is defined by GCAM.SandboxDir, which defaults to %(GCAM.SandboxRoot)s/%(GCAM.ProjectName)s. Note that GCAM.ProjectName is set at run-time to the name of the project being operated on.

With the project’s sandbox directory are the standard GCAM workspace folders, i.e., input, libs, exe (which are symbolic links when possible), and output, which is always created locally in the sandbox to hold the GCAM output files.

Create a figure showing file structure