Model Runner#

Models in SIERRA are (generally) time-series based: they predict things at a given instant over time. They come in two flavors: intra-experiment, and inter-experiment. Intra-experiment models run in the context of a single Experiment, and can "target" any number of stacked_line() graphs for inclusion. If included, model results are plotted using dashed lines to distinguish them from empirical data. Some examples from the sample project:

../../../_images/modelrunner-jsonsim-intra-gaussian.png

The noisy model is \(Normal(\mu=0,\theta=0.1)\). Data is \(Normal(\mu=0, \theta=0.5)\).#

../../../_images/modelrunner-jsonsim-intra-binomial.png

The same model, targeting binomially distributed data by changing YAML config. Binomial data is \(Binomial(n=50,p=0.3)\).#

Inter-experiment models, on the other hand run in the context of the Batch Experiment, and (generally) are built from the outputs of their intra-experiment counterparts, and don't actually compute anything. Likewise, they can "target" any number of summary_line() graphs for inclusion. If included, model results are plotted using dashed lines to distinguish them from empirical data. Some examples from the sample project:

../../../_images/modelrunner-jsonsim-inter-gaussian.png

The model is \(Normal(\mu=0,\theta=0.01)\). Data is \(Normal(\mu=0, \theta=0.5)\).#

../../../_images/modelrunner-jsonsim-inter-binomial.png

The same model, targeting binomially distributed data by changing YAML config. Binomial data is \(Binomial(n=50,p=0.3)\).#

Models can take any number of input (including 0) Collated Output Data files. See Intra-Experiment Data Collation for why you cannot in general use Processed Output Data files as model inputs. SIERRA does not enforce this, so it is left to researchers to follow best practices in this case.

All models, when enabled/active, execute during stage 3.

Ordering Considerations#

Should come after proc.statistics and/or proc.collate if the models use that data.

Usage#

This plugin can be selected by adding proc.modelrunner to the list passed to --proc. When active, this plugin will create <batchroot>/models, and data generated by all executed models stage 3 will accrue under this root directory. Each Experiment will get their own directory in this root for their models. E.g.:

|-- <batchroot>
    |-- models
        |-- c1-exp0
        |-- c1-exp1
        |-- c1-exp2
        |-- c1-exp3
        |-- inter-exp/

Inter-experiment models data will appear in inter-exp/.

This plugin has the following plugin requirements:

  • Intra-Experiment Data Collation. If your model takes Experimental Run outputs as its inputs, then those outputs must be collated before passing to your model to generate statistically valid measures of fit.

  • Graph Generation. This plugin currently is only works with the prod.graphs plugin; that is, the results of running models can only appear on graphs generated using that plugin.

There are multiple "gates" which a model must pass to be run, to allow for maximum flexibility in many different use cases:

  1. A model must to be contained in a .py file which conforms to one of the plugin schemas.

  2. The model's enclosing directory has to be on SIERRA_PLUGIN_PATH.

  3. The path of the model, specified relative to the __init___.py in it's package directory, must be returned by sierra_models(), as described in the plugin schema. For example, if you have the following:

    |-- <project root>
        |-- models
            |-- __init__.py
            |-- mymodel.py
    

    And you have a intra-experiment model class MyAwesomeModel in mymodel.py, then sierra_models() must return mymodel.MyAwesomeModel.

  4. The model is enabled in <project root>/config/models.yaml, as described below.

  5. The proc.modelrunner plugin is active when stage 3 is executed.

  6. The appropriate should_run() callback in the relevant model interaface returns True. This final gate is to allow additional selection of model execution based on current Batch Criteria, so projects can define and leave models enabled which are only valid for certain types of experiments.

Cmdline Interface#

None for the moment.

Configuration#

This plugin is mostly configured via a models.yaml within the Project config root. The file is structured as follows, with all fields required unless otherwise specified.

intra-exp:
   # The name of the model, specified as a python path relative to the
   # directory container the __init__.py. Must be unique among all active
   # models, or data will be overwritten.
   - name: model1
     # The file stems/names of the graphs which this model should appear
     # on. Must match the 'src_stem' field of the corresponding stacked_line
     # graph to trigger inclusion.
     targets:
       - mygraph1
       - another-graph
     # The names of the plotted model predictions. Optional. Defaults to
     # "Model Prediction" for all generated dataframes if omitted.
     legend:
       - foobar
     # All other fields are interpreted as per-model parameters.
     param1: 4
     param2: 18

 inter-exp:
   # The name of the model, specified as a python path relative to the
   # directory container the __init__.py.
   - name: nested.model2
     # The file stems/names of the graphs which this model should appear on.
     # Must match the 'dest_stem' field of the corresponding summary_line
     # graph to trigger inclusion.
     targets:
       - mygraph1
       - another-graph
     # The name of the plotted model predictions. Optional. Defaults to
     # "Model Prediction" for all generated dataframes if omitted.
     legend:
       - foobar
     # All other fields are interpreted as per-model parameters.
     param1: fizz
     param2:
       - buzz
       - frobnicate

Intra-experiment models and inter-experiment models are configured in their corresponding sections as shown. The names of the models in models.yaml must exactly match names in the sierra_models() list (see below). Each model specified in models.yaml can take any number of parameters of any type specified as extra fields in the YAML file as shown above; they will be parsed and passed to the model constructor as part of config. For example, for nested.model2, a dictionary containing {"param1": "fizz", "param2": ["buzz", "frobnicate"]} would be passed.

The category mechanism from controllers.yaml is not used here, because in addition to wanting to filter enabling/running models by controller, you also often want to filter based on the scenario/batch criteria, so filtering is performed via a callback function in the model interface rather than declaratively here.

See also YAML configuration Intra-Experiment Data Collation.

Creating A New Model#

Models can be written in any language, but if they aren't python, you will have to write some python bindings to translate the inputs/outputs into things that SIERRA can understand/is expecting. Model code can be anywhere, as long as the enclosing directory is on SIERRA_PLUGIN_PATH. For a directory on SIERRA_PLUGIN_PATH to be recognized as a model plugin, the directory needs to conform to one of the plugin schemas.

By defining models via sierra_models() which takes a string argument for the type of model [ intra, inter ] and returns a list of the names of the intra- and inter-experiment models present in the file, this allows the user flexibility to group multiple related models together in the same file, rather than requiring 1 model per .py file.

  1. Look at:

    to determine if one of the model types SIERRA already supports will work for you. If one will, great! Otherwise, you'll have to open a PR with a new model for the one you create.

  2. Define your models and/or their bindings in one or more .py files in a directory on SIERRA_PLUGIN_PATH. SIERRA allows model plugins to be anywhere and try to match the names in the models.yaml against loaded plugins. That opens up the possibility of name collisions, but tweaking the plugin path can fix this in the unlikely event that it happens.

  3. Add any necessary model input configuration according to Intra-Experiment Data Collation.

  4. Enable your model by adding it to <project>/config/models.yaml, as shown in the example above.

  5. Run your model during stage 3 by adding proc.modelrunner to --proc. You will need to make sure the proc.collate is also active.