Sub-commands for Monte Carlo Simulation

The pygcam.mcs sub-package provides additional plug-ins for the GCAM tool (gt) to support defining, running, and analyzing Monte Carlo Simulations (MCS) with GCAM.

The GCAM tool (gt) will automatically load the built-in sub-commands defined in pygcam.mcs if the file $HOME/.use_pygcam_mcs exists. This “sentinel” file allows the pygcam-mcs to be “turned off” to produce shorter help messages when not working with Monte Carlo simulations. Use the gt mcs sub-command to enable, disable, or check the status of MCS mode.

This page describes only the sub-commands provided by pygcam.mcs. See the GCAM tool (gt) documentation for more info.

Note

Quick links to sub-commands: addexp, analyze, cluster, delsim, explore, discrete, gensim, ippsetup, iterate, runsim,

usage: gt [-h] [+b] [+B] [+D DIRMAP] [+e ENVIROVARS] [+j JOBNAME]
          [+l LOGLEVEL] [+L LOGFILE] [+m MINUTES] [+M {trial,gensim}]
          [+P name] [+q QUEUENAME] [+r RESOURCES] [+s name=value] [+v]
          [--version] [--VERSION]
          {addexp,analyze,cluster,discrete,gensim,delsim,engine,explore,ippsetup,iterate,parallelPlot,runsim}
          ...

Named Arguments

+b, --batch

Run the commands by submitting a batch job using the command given by config variable GCAM.BatchCommand. (Linux only)

Default: False

+B, --showBatch
 

Show the batch command to be run, but don’t run it. (Linux only)

Default: False

+D, --dirmap A comma-delimited sequence of colon-delimited directory names of the form “/some/host/path:/a/container/path, /host:cont, …”, mapping host dirs to their mount point in a docker container.
+e, --enviroVars
 Comma-delimited list of environment variable assignments to pass to queued batch job, e.g., -E “FOO=1,BAR=2”. (Linux only)
+j, --jobName

Specify a name for the queued batch job. Default is “gt”. (Linux only)

Default: “gt”

+l, --logLevel

Sets the log level for modules of the program. A default log level can be set for the entire program, or individual modules can have levels set using the syntax “module:level, module:level,…”, where the level names must be one of {debug,info,warning,error,fatal} (case insensitive).

Default: “INFO”

+L, --logFile Sets the name of a log file for batch runs. Default is “gt-%j.out” where “%j” (in SLURM) is the jobid. If the argument is not an absolute pathname, it is treated as relative to the value of GCAM.LogDir.
+m, --minutes

Set the number of minutes to allocate for the queued batch job. Overrides config parameter GCAM.Minutes. (Linux only)

Default: 20.0

+M, --mcs

Possible choices: trial, gensim

Used only when running gcamtool from pygcam-mcs.

+P, --projectName
 

The project name (the config file section to read from), which defaults to the value of config variable GCAM.DefaultProject

Default: “”

+q, --queueName
 

Specify the name of the queue to which to submit the batch job. Default is given by config variable GCAM.DefaultQueue. (Linux only)

Default: “slurm”

+r, --resources
 

Specify resources for the queued batch command. Can be a comma-delimited list of assignments of the form NAME=value, e.g., -r ‘pvmem=6GB’. (Linux only)

Default: “”

+s, --set

Assign a value to override a configuration file parameter. For example, to set batch commands to start after a prior job of the same name completes, use –set “GCAM.OtherBatchArgs=-d singleton”. Enclose the argument in quotes if it contains spaces or other characters that would confuse the shell. Use multiple –set flags and arguments to set multiple variables.

Default: []

+v, --verbose

Show diagnostic output

Default: False

--version show program’s version number and exit
--VERSION Default: False

Subcommands

For help on subcommands, use the “-h” flag after the subcommand name

subcommand Possible choices: addexp, analyze, cluster, discrete, gensim, delsim, engine, explore, ippsetup, iterate, parallelPlot, runsim

Sub-commands:

addexp

Adds the named experiment to the database, with an optional description.

gt addexp [-h] [-d DESCRIPTION] expName

Positional Arguments

expName Add the named experiment to the database.

Named Arguments

-d, --description
 

Add the named experiment to the database.

Default: “No description”

analyze

Analyze simulation results stored in the database for the given simulation. At least one of -c, -d, -i, -g, -p, -t (or their longname equivalents) must be specified.

gt analyze [-h] [-c] [-d] [-e EXPNAME] [-E EXPORTALL] [--exportEMA EXPORTEMA]
           [--forcingPlot] [--cumulative] [-g] [-i] [-l LIMIT] [-m MIN]
           [-M MAX] [-o EXPORTINPUTS] [-O RESULTFILE] [-p] [-R REGIONNAME]
           [-r RESULTNAME] [-s SIMID] [-S] [-t] [-T MAXVARS] [-x XLABEL]
           [--ymax YMAX] [--ymin YMIN]

Named Arguments

-c, --convergence
 

Generate convergence plots for mean, std dev, skewness, and 95% coverage interval.

Default: False

-d, --distros

Plot frequency distributions for input parameters.

Default: False

-e, --expName The name of the experiment or scenario to run.
-E, --exportAll
 Export all inputs for which there are results, and all results for the given expName (-e flag) to the indicated file name.
--exportEMA Export results to the given .tar.gz file in a format suitable for analysis using the EMA Workbench. The -e (–expName) and -r (–resultName) flags can hold comma-delimited lists of experiments and results, respectively.
--forcingPlot

Plot the data in a good format for multiple forcing timeseries plots

Default: False

--cumulative

For –forcingPlot, plot the cumulative annual change in RF

Default: False

-g, --groups

Show the uncertainty importance for groups of parameters.

Default: False

-i, --importance
 

Show the uncertainty importance for each parameter.

Default: False

-l, --limit

Limit the analysis to the given number of results

Default: -1

-m, --min Limit the analysis to values (for the result named with -r) greater than or equal to this value
-M, --max Limit the analysis to values (for the result named with -r) less than or equal to this value
-o, --exportInputs
 A file into which to export input (trial) data.
-O, --resultFile
 Export all model results to the given file. When used with this option, the -r (–resultName) and -e (–expName) flags can be comma-delimited lists of result names and experiment names (scenarios), respectively. The output file, in CSV format will have a header (and data in the form) “trialNum,value,expName,resultName”
-p, --plot

Plot a histogram of the frequency distribution for the named model output (-r required).

Default: False

-R, --regionName
 The region to plot timeseries results for
-r, --resultName
 The name of the result variable to analyze.
-s, --simId

The id of the simulation

Default: 1

-S, --stats

Print mean, median, max, min, std dev, skewness, and 95% coverage interval.

Default: False

-t, --timeseries
 

Plot a timeseries distribution

Default: False

-T, --maxVars

Limit the number of variables displayed on tornado plots to the given value. (Default is 15

Default: 15

-x, --xlabel

Specify a label for the x-axis in the histogram.

Default: “g CO$_2$e MJ$^{-1}$”

--ymax Set the scale of a figure by indicating the value to show as the maximum Y value. (By default, scale is set according to the data.)
--ymin Set the scale of a figure by indicating the value (given as abs(value), but used as -value) to show as the minimum Y value

cluster

Start an ipyparallel cluster after generating batch file templates based on parameters in .pygcam.cfg and the number of tasks to run. Note that the runsim sub-command will start a cluster if one is not already running. More often, this command is used to stop a cluster.

gt cluster [-h] [-c CLUSTERID] [-e MAXENGINES] [-m MINUTESPERRUN]
           [-n NUMTRIALS] [-o OTHERARGS] [-p PROFILE] [-q QUEUE] [-s]
           [-w WORKDIR]
           {start,stop}

Positional Arguments

mode

Possible choices: start, stop

Whether to start or stop the cluster

Named Arguments

-c, --clusterId
 

A string to identify this cluster. Default is the value of config var IPP.ClusterId, currently “mcs”.

Default: “mcs”

-e, --maxEngines
 

Set maximum number of engines to create. Overrides config parameter IPP.MaxEngines, currently 300

Default: 300

-m, --minutesPerRun
 

Set the number of minutes of walltime to allocate per GCAM run. Overrides config parameter IPP.MinutesPerRun, currently 20.0.

Default: 20.0

-n, --numTrials
 

The total number of GCAM trials that will be run on this cluster. (Relevant only for “start” command.)

Default: 10

-o, --otherArgs
 

Command line arguments to append to the ipcluster command.

Default: “”

-p, --profile

The name of the ipython profile to use. Default is the value of config var IPP.Profile, currently “pygcam”.

Default: “pygcam”

-q, --queue

The queue or partition on which to create the controller and engines. Overrides config var IPP.Queue, currently “slurm”.

Default: “slurm”

-s, --stopJobs

Stop running jobs using the value if IPP.StopJobsCommand, currently “scancel -u unknown”. (Ignored for mode “start”.)

Default: False

-w, --workDir

Where to run the ipcluster command. Overrides the value of config var IPP.WorkDir, currently ‘/home/docs/.ipython/profile_pygcam’.

Default: “/home/docs/.ipython/profile_pygcam”

discrete

Convert csv files to the .ddist format.

gt discrete [-h] -i INPUTFILE -o OUTPUTFILE -d DATATITLE [-b BINS]
            [-t TRUNCATE] [-c COUNTTITLE] [-n VARNAME]
            [-v [VARTITLES [VARTITLES ...]]]

Named Arguments

-i, --inputFile
 Path to input .csv file being converted.
-o, --outputFile
 Path to output .ddist file.
-d, --dataTitle
 Actual data title in the .csv file.
-b, --bins

Number of bins to separate discrete distro into

Default: 30

-t, --truncate

Number of digits to truncate output to. Default is 3

Default: 3

-c, --countTitle
 

Title of column representing counts of data.

Default: “count”

-n, --varName Title of rows of output distribution
-v, --varTitles
 Titles of columns keying different distributions in the input file

gensim

Generates input files for simulations by reading {ProjectDir}/mcs/parameters.xml in the project directory.

gt gensim [-h] [--delete] [-d DATAFILE] [-D DESC] [-e EXPORTVARS]
          [-g GROUPNAME] [-m {montecarlo,sobol,fast,morris}] [-o OUTFILE]
          [-p PARAMFILE] [-r RUNROOT] [-S] [-s SIMID] [-t TRIALS]

Named Arguments

--delete

DELETE and recreate the simulation “run” directory.

Default: False

-d, --dataFile Load the trial data from a CSV into the database. Useful for restoring data.
-D, --desc

A brief (<= 256 char) description the simulation.

Default: “”

-e, --exportVars
 

Export variable and distribution info in a tab-delimited file with the given name and exit.

Default: “”

-g, --groupName
 

The name of a scenario group to process.

Default: “”

-m, --method

Possible choices: montecarlo, sobol, fast, morris

Use the specified method to generate trial data. Default is “montecarlo”.

Default: “montecarlo”

-o, --outFile For methods other than “montecarlo”. The path to a “package directory” into which SALib-related data are stored. If the filename does not end in ‘.sa’, this extension is added. The file ‘problem.csv’ within the package directory will contain the parameter specs in SALib format. The file inputs.csv is also generated in the file package using the chosen method’s sampling method. If an outFile is not specified, a package of the name ‘data.sa’ is created in the simulation run-time directory.
-p, --paramFile
 Specify an XML file containing parameter definitions. Defaults to the value of config parameter MCS.ParametersFile (currently /home/docs/projects//mcs/parameters.xml)
-r, --runRoot Root of the run-time directory for running user programs. Defaults to value of config parameter MCS.Root (currently /home/docs/mcs)
-S, --calcSecondOrder
 

For Sobol method only – calculate second-order sensitivities.

Default: False

-s, --simId

The id of the simulation. Default is 1.

Default: 1

-t, --trials

The number of trials to create for this simulation (REQUIRED). If a value of 0 is given, scenario setup is performed, scenario names are added to the database, and meta-data is copied, but new trial data is not generated.

Default: -1

delsim

Delete simulation results and re-initialize the database for the given user application. This is done automatically by the sub-command gensim when the --delete flag is specified.

gt delsim [-h] [-r] [-e]

Named Arguments

-r, --deleteSims
 

Delete all simulations from the run directory.

Default: False

-e, --empty

Create the database schema but don’t add any data. Useful when restoring from a dumped database.

Default: False

engine

(MCS) Starts additional worker engines on a running cluster.

gt engine [-h] [-c CLUSTERID] [-n NUMTRIALS] [-o OTHERARGS] [-p PROFILE]
          [-w WORKDIR]

Named Arguments

-c, --clusterId
 

A string to identify this cluster. Default is the value of config var IPP.ClusterId, currently “mcs”.

Default: “mcs”

-n, --numTrials
 

The number of additional trials to create engines for. Default is 1

Default: 1

-o, --otherArgs
 

Command line arguments to append to the ipengine command.

Default: “”

-p, --profile

The name of the ipython profile to use. Default is the value of config var IPP.Profile, currently “pygcam”.

Default: “pygcam”

-w, --workDir

Where to run the ipengine command. Overrides the value of config var IPP.WorkDir, currently ‘/home/docs/.ipython/profile_pygcam’.

Default: “/home/docs/.ipython/profile_pygcam”

explore

Run the MCS “explorer”, a browser-based interactive tool for exploring Monte Carlo simulation results. After running gt explore, point your browser to http://localhost:8050 to load the MCS Explorer.

gt explore [-h] [-d] [-H HOST] [-P PORT]

Named Arguments

-d, --debug

Enable debug mode in the dash server

Default: False

-H, --host

Set the host address to serve the application on. Default is localhost (127.0.0.1).

Default: “127.0.0.1”

-P, --port

Set the port to serve the application on. Default is 8050.

Default: 8050

ippsetup

Create a new ipyparallel profile to use with pygcam.mcs. This command generates the profile and edits the default configuration files as per command-line arguments to this sub-command.

gt ippsetup [-h] [-a ACCOUNT] [-e ENGINES] [-m MINUTES] [-p PROFILE]
            [-s {Slurm,PBS,LSF}]

Named Arguments

-a, --account

The account name to use to run jobs on the cluster system. Used by Slurm only. Default is “”

Default: “”

-e, --engines

Set default number of engines to allow per node. This is overridden by runsim; this value is used when running the cluster “manually”. Default value is 4.

Default: 4

-m, --minutes

The default number of minutes to allocate per GCAM run. (Used by Slurm only.) This is used for the “+b / –batch” and “gt run -D” options only. The “runsim” sub-command uses the value in IPP.MinutesPerRun. Default value is 30.

Default: 30

-p, --profile

The name of the ipython profile to create. Set config variable IPP.Profile to the same value. Default is “pygcam”.

Default: “pygcam”

-s, --scheduler
 

Possible choices: Slurm, PBS, LSF

The resource manager / scheduler your system uses. Default is Slurm.

Default: “Slurm”

iterate

Run a command in each trialDir, or if expName is given, in each expDir. The following arguments are available for use in the command string, specified within curly braces: appName, simId, trialNum, expName, trialDir, expDir. For example, to run the fictional program “foo” in each trialDir for a given set of parameters, you might write:

gt iterate -s1 -c “foo -s{simId} -t{trialNum} -i{trialDir}/x -o{trialDir}/y/z.txt”.
gt iterate [-h] -c COMMAND [-n] [-s SIMID] [-S SCENARIO] [-t TRIALS]

Named Arguments

-c, --command A command string to execute for each trial. The following arguments are available for use in the command string, specified within curly braces: projectName, simId, trialNum, scenario, expName, trialDir, expDir.
-n, --noRun

Show the commands that would be executed, but don’t run them

Default: False

-s, --simId

The id of the simulation. Default is 1.

Default: 1

-S, --scenario

The name of the scenario

Default: “”

-t, --trials Comma separated list of trial or ranges of trials to run. Ex: 1,4,6-10,3. Defaults to running all trials for the given simulation.

parallelPlot

Generate a parallel coordinates plot for a set of simulation results.

gt parallelPlot [-h] -r RESULTNAME -s SIMID -S SCENARIO [-b INPUTBINS]
                [-l OUTPUTLABELS] [--limit LIMIT] [-i NUMINPUTS] [-I]
                [-o OUTPUT] [-q] [-R ROTATE]

Named Arguments

-r, --resultName
 The name of the result to create the plot for
-s, --simId The id of the simulation
-S, --scenario The name of the scenario
-b, --inputBins
 Allocate values for each variable into the given number of bins. By default, the bins boundaries are evenly spaced. If the -q/–quantile flag is given, the bins will contain an equal number of values. Use -l / –labels to assign category names to the bins.
-l, --outputLabels
 

Category names for the output bins. Value must be a comma-delimited list of strings.

Default: “Low,Medium,High”

--limit

Limit analysis to this number of trials

Default: 0

-i, --numInputs
 The number of most-highly rank-correlated inputs to include in the figure. By default, an attempt is made to plot all inputs.
-I, --invert

Plot negatively correlated data as (1 - x) rather than (x).

Default: False

-o, --output The name of the graphic output file to create. File format is determined from the filename extension. Default is {plotDir}/s{scenarioId}/{scenario}-{resultName}-parallel.png
-q, --quantiles
 

Create bins with an (approx.) equal number of values rather the default, which is to space the bin boundaries equally across the range of values.

Default: False

-R, --rotate Angle of rotation for X-axis labels

runsim

Run the identified trials on compute engines.

gt runsim [-h] [-B] [-c CLUSTERID] [-C] [-D] [-e MAXENGINES] [-g GROUPNAME]
          [-G] [-I] [-l] [-m MINUTESPERRUN] [-n NUMTRIALS] [-N] [-p PROFILE]
          [--programArgs PROGRAMARGS] [-q QUEUE] [-r STATUSES] [-R] [-s SIMID]
          -S SCENARIOS [-t TRIALS] [-w WAITSECS]

Named Arguments

-B, --noBatchQueries
 

Skip running batch queries.

Default: False

-c, --clusterId
 

A string to identify this cluster. Default is the value of config var IPP.ClusterId, currently “mcs”.

Default: “mcs”

-C, --collectResults
 

Equivalent to specifying –noGCAM –noBatchQueries –noPostProcessor –runLocal. Useful if runs have actually succeeded but results have not been saved to the SQL database.

Default: False

-D, --noDatabase
 

Don’t save query results to the SQL database.

Default: True

-e, --maxEngines
 

Set maximum number of engines to create. (Ignored unless -C flag is specified. Overrides config parameter IPP.MaxEngines, currently 300

Default: 300

-g, --groupName
 

The name of a scenario group to process.

Default: “”

-G, --noGCAM

Don’t run GCAM, just run the batch queries and post-processor (if defined).

Default: False

-I, --dontShutdownWhenIdle
 

Do not shutdown engines when they are idle and there are no outstanding tasks.

Default: False

-l, --runLocal

Runs the program locally instead of submitting a batch job.

Default: False

-m, --minutesPerRun
 

Set the number of minutes of walltime to allocate per GCAM run. Ignored unless -C flag is specified. Overrides config parameter IPP.MinutesPerRun, currently 20.0.

Default: 20.0

-n, --numTrials
 

The total number of GCAM trials to be run on this cluster.

Default: 0

-N, --noPostProcessor
 

Don’t run post-processor steps.

Default: False

-p, --profile

The name of the ipython profile to use. Default is the value of config var IPP.Profile, currently “pygcam”.

Default: “pygcam”

--programArgs

Arguments to pass to user program. Quote sequences that include spaces, e.g., to pass args: -x foo, use –programArgs=”-x foo”

Default: “”

-q, --queue

The queue or partition on which to create the controller and engines. Ignored unless -C flag is used. Overrides config var IPP.Queue, currently “slurm”.

Default: “slurm”

-r, --redo Re-launch all trials for the given simId with the status specified. Argument can be comma-delimited list of status names. When used with -R, trial numbers are listed but trials are not run. Recognized values are {new, queued, running, failed, killed, aborted, alarmed, gcamerror, unsolved, missing}. “Missing” is a pseudo-value interpreted to find all runs that have not been executed, i.e., runs not appearing in the ‘run’ table.
-R, --redoListOnly
 

Used with -r to only list the trials to redo, then quit.

Default: False

-s, --simId

The id of the simulation (Defaults to 1.)

Default: 1

-S, --scenario The name of the scenario(s). May be a comma-separated list of names. Use config var MCS.DefaultScenario to set a default scenario name. No default has been set.
-t, --trials

Comma-separated list of trial numbers and/or hyphen-separated ranges of trial numbers to run. Ex: 1,4,6-10,3. Default is to run all defined trials.

Default: “”

-w, --waitSecs

How many seconds to wait between queries to the ipyparallel controller for completed jobs. Default is 30.

Default: 30.0