.. Copyright 2025 John Harwell, All rights reserved. SPDX-License-Identifier: MIT .. _plugins/proc/modelrunner: ============ Model Runner ============ Models in SIERRA are (generally) time-series based: they predict things at a given instant over time. They come in two flavors: intra-experiment, and inter-experiment. Intra-experiment models run in the context of a single :term:`Experiment`, and can "target" any number of :func:`~sierra.core.graphs.stacked_line` graphs for inclusion. If included, model results are plotted using dashed lines to distinguish them from empirical data. Some examples from :xref:`the sample project `: .. tab-set:: .. tab-item:: Gaussian .. figure:: figures/modelrunner-jsonsim-intra-gaussian.png :width: 50% The noisy model is :math:`Normal(\mu=0,\theta=0.1)`. Data is :math:`Normal(\mu=0, \theta=0.5)`. .. tab-item:: Binomial .. figure:: figures/modelrunner-jsonsim-intra-binomial.png :width: 50% The same model, targeting binomially distributed data by changing YAML config. Binomial data is :math:`Binomial(n=50,p=0.3)`. Inter-experiment models, on the other hand run in the context of the :term:`Batch Experiment`, and (generally) are built from the outputs of their intra-experiment counterparts, and *don't* actually compute anything. Likewise, they can "target" any number of :func:`~sierra.core.graphs.summary_line` graphs for inclusion. If included, model results are plotted using dashed lines to distinguish them from empirical data. Some examples from :xref:`the sample project `: .. tab-set:: .. tab-item:: Gaussian .. figure:: figures/modelrunner-jsonsim-inter-gaussian.png :width: 50% The model is :math:`Normal(\mu=0,\theta=0.01)`. Data is :math:`Normal(\mu=0, \theta=0.5)`. .. tab-item:: Binomial .. figure:: figures/modelrunner-jsonsim-inter-binomial.png :width: 50% The same model, targeting binomially distributed data by changing YAML config. Binomial data is :math:`Binomial(n=50,p=0.3)`. Models can take any number of input (including 0) :term:`Collated Output Data` files. See :ref:`plugins/proc/collate` for why you cannot in general use :term:`Processed Output Data` files as model inputs. SIERRA does not enforce this, so it is left to researchers to follow best practices in this case. All models, when enabled/active, execute during stage 3. .. _plugins/proc/modelrunner/ordering: Ordering Considerations ======================= Should come after ``proc.statistics`` and/or ``proc.collate`` if the models use that data. Usage ===== This plugin can be selected by adding ``proc.modelrunner`` to the list passed to :ref:`--proc`. When active, this plugin will create ``/models``, and data generated by all executed models stage 3 will accrue under this root directory. Each :term:`Experiment` will get their own directory in this root for their models. E.g.:: |-- |-- models |-- c1-exp0 |-- c1-exp1 |-- c1-exp2 |-- c1-exp3 |-- inter-exp/ Inter-experiment models data will appear in ``inter-exp/``. This plugin has the following plugin requirements: - :ref:`plugins/proc/collate`. If your model takes :term:`Experimental Run` outputs as its inputs, then those outputs must be collated before passing to your model to generate statistically valid measures of fit. - :ref:`plugins/prod/graphs`. This plugin currently is only works with the ``prod.graphs`` plugin; that is, the results of running models can only appear on graphs generated using that plugin. There are multiple "gates" which a model must pass to be run, to allow for maximum flexibility in many different use cases: #. A model must to be contained in a ``.py`` file which conforms to one of the :ref:`plugin schemas `. #. The model's enclosing directory has to be on :envvar:`SIERRA_PLUGIN_PATH`. #. The path of the model, specified relative to the ``__init___.py`` in it's package directory, must be returned by ``sierra_models()``, as described in the plugin schema. For example, if you have the following:: |-- |-- models |-- __init__.py |-- mymodel.py And you have a intra-experiment model class ``MyAwesomeModel`` in ``mymodel.py``, then ``sierra_models()`` must return ``mymodel.MyAwesomeModel``. #. The model is enabled in ``/config/models.yaml``, as described below. #. The ``proc.modelrunner`` plugin is active when stage 3 is executed. #. The appropriate ``should_run()`` callback in the relevant model interaface returns ``True``. This final gate is to allow additional selection of model execution based on current :term:`Batch Criteria`, so projects can define and leave models enabled which are only valid for certain types of experiments. Cmdline Interface ----------------- None for the moment. Configuration ------------- This plugin is mostly configured via a ``models.yaml`` within the :term:`Project` config root. The file is structured as follows, with all fields required unless otherwise specified. .. code-block:: YAML intra-exp: # The name of the model, specified as a python path relative to the # directory container the __init__.py. Must be unique among all active # models, or data will be overwritten. - name: model1 # The file stems/names of the graphs which this model should appear # on. Must match the 'src_stem' field of the corresponding stacked_line # graph to trigger inclusion. targets: - mygraph1 - another-graph # The names of the plotted model predictions. Optional. Defaults to # "Model Prediction" for all generated dataframes if omitted. legend: - foobar # All other fields are interpreted as per-model parameters. param1: 4 param2: 18 inter-exp: # The name of the model, specified as a python path relative to the # directory container the __init__.py. - name: nested.model2 # The file stems/names of the graphs which this model should appear on. # Must match the 'dest_stem' field of the corresponding summary_line # graph to trigger inclusion. targets: - mygraph1 - another-graph # The name of the plotted model predictions. Optional. Defaults to # "Model Prediction" for all generated dataframes if omitted. legend: - foobar # All other fields are interpreted as per-model parameters. param1: fizz param2: - buzz - frobnicate Intra-experiment models and inter-experiment models are configured in their corresponding sections as shown. The names of the models in ``models.yaml`` must exactly match names in the ``sierra_models()`` list (see below). Each model specified in ``models.yaml`` can take any number of parameters of any type specified as extra fields in the YAML file as shown above; they will be parsed and passed to the model constructor as part of ``config``. For example, for ``nested.model2``, a dictionary containing ``{"param1": "fizz", "param2": ["buzz", "frobnicate"]}`` would be passed. The category mechanism from ``controllers.yaml`` is not used here, because in addition to wanting to filter enabling/running models by controller, you also often want to filter based on the scenario/batch criteria, so filtering is performed via a callback function in the model interface rather than declaratively here. See also YAML configuration :ref:`plugins/proc/collate`. Creating A New Model ==================== Models can be written in any language, but if they aren't python, you will have to write some python bindings to translate the inputs/outputs into things that SIERRA can understand/is expecting. Model code can be anywhere, as long as the enclosing directory is on :envvar:`SIERRA_PLUGIN_PATH`. For a directory on :envvar:`SIERRA_PLUGIN_PATH` to be recognized as a model plugin, the directory needs to conform to one of the :ref:`plugin schemas `. By defining models via ``sierra_models()`` which takes a string argument for the type of model [ ``intra``, ``inter`` ] and returns a list of the names of the intra- and inter-experiment models present in the file, this allows the user flexibility to group multiple related models together in the same file, rather than requiring 1 model per ``.py`` file. #. Look at: - :class:`~sierra.core.models.interface.IIntraExpModel1D` - :class:`~sierra.core.models.interface.IInterExpModel1D` to determine if one of the model types SIERRA already supports will work for you. If one will, great! Otherwise, you'll have to open a PR with a new model for the one you create. #. Define your models and/or their bindings in one or more ``.py`` files in a directory on :envvar:`SIERRA_PLUGIN_PATH`. SIERRA allows model plugins to be anywhere and try to match the names in the ``models.yaml`` against loaded plugins. That opens up the possibility of name collisions, but tweaking the plugin path can fix this in the unlikely event that it happens. #. Add any necessary model input configuration according to :ref:`plugins/proc/collate`. #. Enable your model by adding it to ``/config/models.yaml``, as shown in the example above. #. Run your model during stage 3 by adding ``proc.modelrunner`` to ``--proc``. You will need to make sure the ``proc.collate`` is also active.