.. Copyright 2025 John Harwell, All rights reserved. SPDX-License-Identifier: MIT .. _plugins/proc/collate: =============================== Intra-Experiment Data Collation =============================== When generating products, it is often necessary to perform some sort of non-statistical mathematical analysis on the results. These calculations *cannot* be done on the intra-experiment :term:`Processed Output Data` files, because any calculated statistical distributions from them will be invalid; this can be thought of as an average of sums is not the same as a sum of averages. To support such use cases, SIERRA can make the necessary parts of the per-run :term:`Raw Output Data` files available in stage 3 for doing such calculations via :term:`Data Collation`. Of course, like all things in SIERRA, if you don't need this functionality, you can turn it off by deselecting the plugin. This process in stage 3 can be visualized as follows for a single :term:`Experiment`, using :term:`Experimental Run` as SCOPE: .. figure:: /figures/data-collation.png Here, the user has specified that the ``col{0,1}`` in ``file0`` produced by all experimental runs should be combined into a single file. Thus the :term:`Collated Output Data` file generated from that specification will have :math:`j` columns, one per run. Similarly for ``col{A,B}`` in ``file1``. This is collation *within* in an experiment (intra-experiment). Collation *across* experiments (if enabled/configured) is done during stage 4, and is handled by a different plugin. This plugin requires that the selected :ref:`storage plugin ` supports ``pd.DataFrame`` objects. .. _plugins/proc/collate/ordering: Ordering Considerations ======================= Should come after ``proc.statistics`` to generate statistics around collated data. Usage ===== This plugin can be selected by adding ``proc.collate`` to the list passed to ``--proc``. Configuration for this plugin consists of *what* data to collate, and some tweaks for *how* that data should be collated. When active, it will create ``/statistics`` and the following directory structure:: |-- |-- statistics |-- inter-exp ``inter-exp/`` contains :term:`Collated Output Data` files, drawn from specific columns in :term:`Raw Output Data` files. This plugin outputs its data as described above is so it can be used with :ref:`plugins/prod/graphs`, which expects its outputs to be under ``statistics/``. This plugin does not require additional plugins to be active when it is run. Cmdline Interface ----------------- .. sphinx_argparse_cli:: :module: sierra.plugins.proc.collate.cmdline :func: sphinx_cmdline_multistage :prog: sierra Configuration ------------- Controls *what* to collate. Collated data is usually "interesting" in some way; e.g., related to system performance. Configuration lives in a ``collate.yaml`` file; all fields are required unless otherwise specified. .. code-block:: YAML # Contains a list of config items for intra-experiment collation (i.e., # collation at the level of experimental runs). intra-exp: # Each config item has 'file' and 'cols' fields. 'file' specifies a # filepath, relative to the output directory for each experimental run, # containing the data columns of interest. 'cols' specifies the columns of # interest. - file: foo/bar cols: - col1 - col2