What is pygcam?¶
The pygcam
package comprises a set of Python modules and a main driver script designed
to facilitate a more efficient workflow using
Global Change Assessment Model (GCAM).
The tools are intended to meet the needs of different types of users, from basic users who just want to run the model, to “power” users interested in writing custom scripts, to software developers wanting to write new tools like graphical user interfaces for working with GCAM.
The main components include:
- Software libraries that simplify development of higher-level software tools (graphical interfaces, scripts) that interface with GCAM. The library will provide an Application Programming Interface (API) to the GCAM input and output data, and to running GCAM, querying results, and performing common processing tasks such as computing differences between policy and baseline scenarios and plotting results.
- Command-line tools built upon the library described above to package commonly required functionality into a convenient form for direct use and to support development of higher-level, custom scripts. (See GCAM tool (gt) for details.)
- A Monte Carlo Simulation framework using GCAM on high-performance computers, allowing users to explore uncertainty in model outputs resulting from uncertainty in model inputs, and to characterize the contribution of individual parameters to variance in output metrics.
- Graphical User Interfaces that simplify use of the libraries and tools as well as providing unique capabilities such as graphical exploration and comparison of sets of Monte Carlo simulation results. (See Graphical User Interface and MCS Explorer for more info.)
- Cross-platform capability on Windows, Mac OS X, and Linux.
- Installer scripts to simplify installation of tools on users’ computers.
- User documentation for all of the above. (This website!)
Users can control many aspects of pygcam
through a configuration file
found in ${HOME}/.pygcam.cfg
. When GCAM tool (gt) is run the first time, a
configuration file is created, with all configuration options commented out and
showing their default values.
The main script (GCAM tool (gt)) implements several “subcommands” that perform various steps in a typical GCAM analysis. The script implements a plug-in architecture allowing users to customize gt and avoid a proliferation of scripts. The available subcommands for general use include:
chart
generates bar or line graphs, with a large number of command-line arguments to customize chart appearance.config
displays the values of configuration parameters and edits the user’s configuration file.diff
computes the differences between two sets of GCAM results found stored in .CSV files in the format generated by thequery
subcommand.gcam
runs the GCAM model in any indicated directory. Links to a reference GCAM directory are used when possible to avoid needless file copying.gui
runs a local web server that provides a browser-based graphical user interface (GUI) at the address http://127.0.0.1:8050.init
initializes key configuration variables in the${HOME}/.pygcam.cfg
file.mcs
Enable or disable Monte Carlo simulation sub-commands.mi
runs the ModelInterface program based on settings in the user’s config file.new
creates the structure and files required for a newpygcam
project.protect
generates XML input files that define custom land-protection scenarios.query
executes named batch queries, and supports regional aggregation described in a region mapping file.run
reads an XML input file and runs one or more steps of an analysis, and these steps typically invoke other sub-commands as required.sandbox
shows, creates, and deletes run-time workspaces used bypygcam
.setup
modifies GCAM XML data and configuration files according to user instructions in either XML format or as a Python script.
If the user has enabled Monte Carlo Simulation support (currently available only
on Linux clusters) by running gt mcs on
, additional
MCS-related sub-commands become available.
Also see the GCAM XML-Setup documentation to for information on programmatically modifying
copies of GCAM XML files. This “setup” step can be one of the commands called by run
.
Guiding Principles¶
The following general principles underlie the design of pygcam
:
Common tasks should be easy to accomplish but flexible.
In general, this run-time simplicity requires a bit of setup-time complexity. That is, simplicity at run-time is achieved by relying on the project.xml file, which defines all key aspects of a project. Fortunately, the project file need only be created once.
For example, with a typical project.xml file, a user can setup and run all scenarios for the default project, compute differences between policy and baseline scenarios, run custom computations on results, and generate figures with the simple command
gt run
. And the user can also identify which projects, scenario groups, scenarios, and/or steps to operate on, as needed.
The user should be able to customize virtually all aspects of the system.
Projects based on GCAM will have a variety of requirements and use patterns that are difficult to anticipate. The Configuration System defines “reasonable” defaults for all parameters, while allowing the user to modify virtually all file and directory locations, command arguments, and other key aspects of the system. There are very few hardcoded aspects to the system.
Projects should be able to be isolated from one another.
By default,
pygcam
uses symbol link (symlinks) to avoid unnecessary copying sets of large files such as the entire input directory. However, files that are constant across projects in one environment might be changed between projects in another environment. For example, your projects might involve different versions of the GCAM executable, which in most projects (outside of JGCRI) is unchanged across projects. To avoid having changes in shared files inadvertently “pollute” another project, the user can choose which files from the reference workspace (more on this below) to copy and which to link, thereby optimizing the trade-off between complete isolation and avoiding unnecessary copying. (Note that Windows prevents users from creating symlinks by default;pygcam
will copy all files on Windows when symlink creation fails.)
Manual editing of XML files should be avoided whenever possible.
Manual modifications to XML files are difficult to document effectively and are error-prone. Generating required files using an XML file or a short Python script based on the
pygcam
library ensures consistency and serves as complete documentation of changes made to XML files.
Reference GCAM files should not be modified to generate project scenarios.
Reference GCAM files are never modified. Rather, they are copied, as needed, and the copies to the project’s run-time directory and modified there. This allows a set of project files to be shared with others without having to provide a copy of an entire GCAM workspace. The only requirement is that both users start from the same reference system, which for most users will be the latest public release of GCAM.
An additional advantage of this approach is that instructions to generate scenarios should be portable across GCAM versions, provided that the
pygcam
library is updated to be aware of any relevant changes in the XML format.
Managing Scenarios¶
In GCAM, a scenario is just a name assigned within a configuration file to distinguish runs of GCAM. The scenario name is set in GCAM’s configuration.xml and appears in the upper-left panel of the ModelInterface application.
In pygcam
, the scenario concept is made more helpful by implementing
a few simple conventions regarding directory structure and filenames. Using
a consistent structure simplifies use of the library and tools since more
information can be conveyed through the scenario name. The “setup tools” (to
be documented) follow these conventions when generating modified XML, allowing
the other workflow scripts to find the resulting files.
Scenario conventions¶
We extend the definition of scenario to identify a set of XML files that
are used together. In this approach, “scenario” refers to both the name
assigned in a configuration.xml file and a corresponding directory holding
customized XML files, and a configuration file called config.xml
.
GCAM Workspaces¶
The tools are most convenient to use if you follow the file layout created by
the “setup tools”. It is not required to use these tools or this file structure,
but everything is designed to simplify coordination between the programs.
Many of these (absolute and relative) directory locations can be modified to
suit your preferences via the pygcam
configuration file.
The default file layout is structured to support multiple projects, where each project involves one or more baseline and policy scenarios. These project files can all be stored within a central GCAM work area, or anywhere you prefer.
Project structure¶
One of the goals of the pygcam
project system is to distill a minimal set
of instructions for creating and running a GCAM analysis. Automating this
complex process required developing a consistent structure with computable
directory locations. There are three main directories of interest:
- Reference workspace
- The source of original GCAM files, including XML files, the GCAM program itself, and other ancillary files. The configuration variable
GCAM.RefWorkspace
identifies this location, which is typically a public GCAM distribution, or a customized version that is the basis for a set of analyses.- Project directory
- Where project source files are located. This is identified by the configuration variable
GCAM.ProjectDir
. By default, thepygcam
framework expects certain directories to be located at known relative locations within the project directory, but in most cases, these locations can be adjusted by modifying configuration file parameters.- Sandbox directory
- This is a separate, generated workspace, structured like a standard GCAM “Main_User_Workspace” (i.e., with subdirectories “exe”, “input”, “output”, and other required files) in which GCAM is actually run. This location is identified by the configuration variable
GCAM.SandboxDir
. The sandbox directory is created by copying or linking files from the reference workspace based on the configuration parametersGCAM.WorkspaceFilesToLink
andGCAM.WorkspaceFilesToCopy
. Modified or generated XML files are also placed in the run directory by the GCAM XML-Setup system.
Project directory¶
The GCAM XML-Setup system provides programmatic methods (i.e., Python functions) that automatic common edits to GCAM XML input and configuration files. The output of the setup system is thus a set of modified XML input and configuration files. These files should not be edited manually as the changes will be overwritten the next time the setup system is run.
The files defining a project are stored in the directory identified by the configuration
parameter GCAM.XmlSrc
, which defaults to %(GCAM.ProjectDir)s/xmlsrc
, i.e., the
directory xmlsrc
within your project directory. Included under xmlsrc
are
- Custom XML files
- A Python file (by default,
scenarios.py
) that modifies or creates XML files to generate baseline and policy scenarios. This module is invoked by thesetup
sub-command inpygcam
.
The gcamtool setup sub-command loads the Python file and calls the
setup functions corresponding to the requires baseline and policy scenarios. This
modifies reference XML files and copies custom XML files to a directory identified by the
config parameter GCAM.LocalXml
, which default to %(GCAM.ProjectDir)s/local-xml
.
Dynamically generated constraints (i.e., those that depend on the output of the baseline
scenario) are written to the directory indicated by GCAM.DynXml
, which defaults to
%(GCAM.ProjectDir)s/dyn-xml
. See the GCAM XML-Setup page for further details.
N.B. a system for defining projects without writing any Python code is currently in development.
Run-time structure¶
In pygcam
, each GCAM scenario is run in a separate copy of the standard GCAM
workspace. On Unix-like systems (and on Windows if
the user has adequate administrative privileges), the read-only files are symbolically
linked to the scenario workspace, avoiding copying of many megabytes of data.
To avoid ambiguity between the reference GCAM workspace (what was previously known
as Main_User_Workspace
)
and the per-scenario, generated workspaces, we refer to the latter as sandboxes, which
is a computing term that refers to isolation areas in which programs are run to avoid
interactions with other programs.
The default pygcam
structure assumes there is a directory under which you want all
sandboxes to be created. This is defined by the config parameter GCAM.SandboxRoot
,
which defaults to %(Home)s/sandbox
. You can change GCAM.SandboxRoot
to any
desired directory. The sandbox for an
individual project is defined by GCAM.SandboxDir
, which defaults
to %(GCAM.SandboxRoot)s/%(GCAM.ProjectName)s
. Note that GCAM.ProjectName
is
set at run-time to the name of the project being operated on.
With the project’s sandbox directory are the standard GCAM workspace folders, i.e.,
input
, libs
, exe
(which are symbolic links when possible), and output
,
which is always created locally in the sandbox to hold the GCAM output files.
Create a figure showing file structure