`sierra.core.pipeline.stage3.run_collator`

Classes for collating data within a Batch Experiment.

Collation is the process of “lifting” data from Experimental Runs across all Experiment for all experiments in a Batch Experiment into a single CSV (a reduce operation). This is needed to correctly calculate summary statistics for performance measures in stage 4: you can’t just run the calculated stddev through the calculations for flexibility (for example) because comparing curves of stddev is not meaningful. Stage 4 needs access to raw-(er) run data to construct a distribution of performance measure values to then calculate the summary statistics (such as stddev) over.

ExperimentalRunParallelCollator: Generates Collated .csv files for each Experiment.
ExperimentalRunCSVGatherer: Gather Output .csv files across all runs within an experiment.
ExperimentalRunCollator: Collate gathered Output .csv files together (reduce operation).

class sierra.core.pipeline.stage3.run_collator.ExperimentalRunParallelCollator(main_config: dict, cmdopts: Dict[str, Any])[source]

Generates Collated .csv files for each Experiment.

Collated .csv files generated from Output .csv files across: Experimental Runs. Gathered in parallel for each experiment for speed, unless disabled with --processing-serial.

Inheritance

__call__(criteria: IConcreteBatchCriteria) → None[source]: Call self as a function.

__dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.run_collator', '__doc__': 'Generates :term:`Collated .csv` files for each :term:`Experiment`.\n\n :term:`Collated .csv` files generated from :term:`Output .csv` files across\n :term:`Experimental Runs <Experimental Run>`. Gathered in parallel for\n each experiment for speed, unless disabled with ``--processing-serial``.\n\n ', '__init__': <function ExperimentalRunParallelCollator.__init__>, '__call__': <function ExperimentalRunParallelCollator.__call__>, '_gather_worker': <staticmethod object>, '_process_worker': <staticmethod object>, '__dict__': <attribute '__dict__' of 'ExperimentalRunParallelCollator' objects>, '__weakref__': <attribute '__weakref__' of 'ExperimentalRunParallelCollator' objects>, '__annotations__': {}})

__doc__ = 'Generates :term:`Collated .csv` files for each :term:`Experiment`.\n\n :term:`Collated .csv` files generated from :term:`Output .csv` files across\n :term:`Experimental Runs <Experimental Run>`. Gathered in parallel for\n each experiment for speed, unless disabled with ``--processing-serial``.\n\n '

__init__(main_config: dict, cmdopts: Dict[str, Any])[source]

__module__ = 'sierra.core.pipeline.stage3.run_collator'

__weakref__: list of weak references to the object (if defined)

static _gather_worker(gatherq: Queue, processq: Queue, main_config: Dict[str, Any], project: str, storage_medium: str) → None[source]

static _process_worker(processq: Queue, main_config: Dict[str, Any], batch_stat_collate_root: Path, storage_medium: str, df_homogenize: str) → None[source]

class sierra.core.pipeline.stage3.run_collator.ExperimentalRunCSVGatherer(main_config: Dict[str, Any], storage_medium: str, processq: Queue)[source]

Gather Output .csv files across all runs within an experiment.

This class can be extended/overriden using a Project hook. See SIERRA Hooks for details.

processq: The multiprocessing-safe producer-consumer queue that the data gathered from experimental runs will be placed in for processing.

storage_medium: The name of the storage medium plugin to use to extract dataframes from when reading run data.

main_config: Parsed dictionary of main YAML configuration.

logger: The handle to the logger for this class. If you extend this class, you should save/restore this variable in tandem with overriding it in order to get logging messages have unique logger names between this class and your derived class, in order to reduce confusion.

Inheritance

__call__(batch_output_root: Path, exp_leaf: str)[source]

Gather CSV data from all experimental runs in an experiment.

Gathered data is put in a queue for processing.

Parameters:: exp_leaf – The name of the experiment directory within the batch_output_root.

__dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.run_collator', '__doc__': 'Gather :term:`Output .csv` files across all runs within an experiment.\n\n This class can be extended/overriden using a :term:`Project` hook. See\n :ref:`ln-sierra-tutorials-project-hooks` for details.\n\n Attributes:\n\n processq: The multiprocessing-safe producer-consumer queue that the data\n gathered from experimental runs will be placed in for\n processing.\n\n storage_medium: The name of the storage medium plugin to use to extract\n dataframes from when reading run data.\n\n main_config: Parsed dictionary of main YAML configuration.\n\n logger: The handle to the logger for this class. If you extend this\n class, you should save/restore this variable in tandem with\n overriding it in order to get logging messages have unique\n logger names between this class and your derived class, in order\n to reduce confusion.\n\n ', '__init__': <function ExperimentalRunCSVGatherer.__init__>, '__call__': <function ExperimentalRunCSVGatherer.__call__>, 'gather_csvs_from_run': <function ExperimentalRunCSVGatherer.gather_csvs_from_run>, '__dict__': <attribute '__dict__' of 'ExperimentalRunCSVGatherer' objects>, '__weakref__': <attribute '__weakref__' of 'ExperimentalRunCSVGatherer' objects>, '__annotations__': {}})

__doc__ = 'Gather :term:`Output .csv` files across all runs within an experiment.\n\n This class can be extended/overriden using a :term:`Project` hook. See\n :ref:`ln-sierra-tutorials-project-hooks` for details.\n\n Attributes:\n\n processq: The multiprocessing-safe producer-consumer queue that the data\n gathered from experimental runs will be placed in for\n processing.\n\n storage_medium: The name of the storage medium plugin to use to extract\n dataframes from when reading run data.\n\n main_config: Parsed dictionary of main YAML configuration.\n\n logger: The handle to the logger for this class. If you extend this\n class, you should save/restore this variable in tandem with\n overriding it in order to get logging messages have unique\n logger names between this class and your derived class, in order\n to reduce confusion.\n\n '

__init__(main_config: Dict[str, Any], storage_medium: str, processq: Queue) → None[source]

__module__ = 'sierra.core.pipeline.stage3.run_collator'

__weakref__: list of weak references to the object (if defined)

gather_csvs_from_run(run_output_root: Path) → Dict[Tuple[str, str], DataFrame][source]

Gather all data from a single run within an experiment.

Returns:

A dictionary of <(CSV file name, CSV performance column),: dataframe> key-value pairs. The CSV file name is the leaf part of the path with the extension included.

Return type:

dict

class sierra.core.pipeline.stage3.run_collator.ExperimentalRunCollator(main_config: Dict[str, Any], batch_stat_collate_root: Path, storage_medium: str, df_homogenize: str)[source]

Collate gathered Output .csv files together (reduce operation).

Output .csv`s gathered from N :term:`Experimental Runs are combined together into a single Summary .csv per Experiment with 1 column per run.

Inheritance

__call__(gathered_runs: List[str], gathered_dfs: List[Dict[Tuple[str, str], DataFrame]], exp_leaf: str) → None[source]: Call self as a function.

__dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.run_collator', '__doc__': 'Collate gathered :term:`Output .csv` files together (reduce operation).\n\n :term:`Output .csv`s gathered from N :term:`Experimental Runs <Experimental\n Run>` are combined together into a single :term:`Summary .csv` per\n :term:`Experiment` with 1 column per run.\n\n ', '__init__': <function ExperimentalRunCollator.__init__>, '__call__': <function ExperimentalRunCollator.__call__>, '__dict__': <attribute '__dict__' of 'ExperimentalRunCollator' objects>, '__weakref__': <attribute '__weakref__' of 'ExperimentalRunCollator' objects>, '__annotations__': {}})

__doc__ = 'Collate gathered :term:`Output .csv` files together (reduce operation).\n\n :term:`Output .csv`s gathered from N :term:`Experimental Runs <Experimental\n Run>` are combined together into a single :term:`Summary .csv` per\n :term:`Experiment` with 1 column per run.\n\n '

__init__(main_config: Dict[str, Any], batch_stat_collate_root: Path, storage_medium: str, df_homogenize: str) → None[source]

__module__ = 'sierra.core.pipeline.stage3.run_collator'

__weakref__: list of weak references to the object (if defined)

sierra.core.pipeline.stage3.run_collator

`sierra.core.pipeline.stage3.run_collator`