`sierra.core.pipeline.stage3.statistics_calculator`

Classes for generating statistics within and across experiments in a batch.

GatherSpec: Data class for specifying .csv files to gather from an Experiment.
BatchExpParallelCalculator: Process Output .csv files for each experiment in the batch.
ExpCSVGatherer: Gather all Output .csv files from all runs within an experiment.
ExpStatisticsCalculator: Generate statistics from output files for all runs within an experiment.

class sierra.core.pipeline.stage3.statistics_calculator.GatherSpec(exp_name: str, item_stem: str, imagize_csv_stem: Optional[str])[source]

Data class for specifying .csv files to gather from an Experiment.

Inheritance

__dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.statistics_calculator', '__doc__': '\n Data class for specifying .csv files to gather from an :term:`Experiment`.\n ', '__init__': <function GatherSpec.__init__>, 'for_imagizing': <function GatherSpec.for_imagizing>, '__dict__': <attribute '__dict__' of 'GatherSpec' objects>, '__weakref__': <attribute '__weakref__' of 'GatherSpec' objects>, '__annotations__': {}})

__doc__ = '\n Data class for specifying .csv files to gather from an :term:`Experiment`.\n '

__init__(exp_name: str, item_stem: str, imagize_csv_stem: Optional[str])[source]

__module__ = 'sierra.core.pipeline.stage3.statistics_calculator'

__weakref__: list of weak references to the object (if defined)

for_imagizing()[source]

class sierra.core.pipeline.stage3.statistics_calculator.BatchExpParallelCalculator(main_config: dict, cmdopts: Dict[str, Any])[source]

Process Output .csv files for each experiment in the batch.

In parallel for speed.

Inheritance

__call__(criteria: IConcreteBatchCriteria) → None[source]: Call self as a function.

__dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.statistics_calculator', '__doc__': 'Process :term:`Output .csv` files for each experiment in the batch.\n\n In parallel for speed.\n ', '__init__': <function BatchExpParallelCalculator.__init__>, '__call__': <function BatchExpParallelCalculator.__call__>, '_execute': <function BatchExpParallelCalculator._execute>, '_gather_worker': <staticmethod object>, '_process_worker': <staticmethod object>, '__dict__': <attribute '__dict__' of 'BatchExpParallelCalculator' objects>, '__weakref__': <attribute '__weakref__' of 'BatchExpParallelCalculator' objects>, '__annotations__': {}})

__doc__ = 'Process :term:`Output .csv` files for each experiment in the batch.\n\n In parallel for speed.\n '

__init__(main_config: dict, cmdopts: Dict[str, Any])[source]

__module__ = 'sierra.core.pipeline.stage3.statistics_calculator'

__weakref__: list of weak references to the object (if defined)

_execute(exp_to_avg: List[Path], avg_opts: Dict[str, Union[str, int]], n_gatherers: int, n_processors: int, pool) → None[source]

static _gather_worker(gatherq: Queue, processq: Queue, main_config: Dict[str, Any], avg_opts: Dict[str, str]) → None[source]

static _process_worker(processq: Queue, main_config: Dict[str, Any], batch_stat_root: Path, avg_opts: Dict[str, str]) → None[source]

class sierra.core.pipeline.stage3.statistics_calculator.ExpCSVGatherer(main_config: Dict[str, Any], gather_opts: dict, processq: Queue)[source]

Gather all Output .csv files from all runs within an experiment.

“Gathering” in this context means creating a dictionary mapping which .csv came from where, so that statistics can be generated both across and with experiments in the batch.

Inheritance

__call__(exp_output_root: Path) → None[source]: Process the CSV files found in the output save path.

__dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.statistics_calculator', '__doc__': 'Gather all :term:`Output .csv` files from all runs within an experiment.\n\n "Gathering" in this context means creating a dictionary mapping which .csv\n came from where, so that statistics can be generated both across and with\n experiments in the batch.\n ', '__init__': <function ExpCSVGatherer.__init__>, '__call__': <function ExpCSVGatherer.__call__>, '_calc_gather_items': <function ExpCSVGatherer._calc_gather_items>, '_gather_item_from_sims': <function ExpCSVGatherer._gather_item_from_sims>, '_wait_for_memory': <function ExpCSVGatherer._wait_for_memory>, '_verify_exp_outputs': <function ExpCSVGatherer._verify_exp_outputs>, '_verify_exp_outputs_pairwise': <function ExpCSVGatherer._verify_exp_outputs_pairwise>, '__dict__': <attribute '__dict__' of 'ExpCSVGatherer' objects>, '__weakref__': <attribute '__weakref__' of 'ExpCSVGatherer' objects>, '__annotations__': {}})

__doc__ = 'Gather all :term:`Output .csv` files from all runs within an experiment.\n\n "Gathering" in this context means creating a dictionary mapping which .csv\n came from where, so that statistics can be generated both across and with\n experiments in the batch.\n '

__init__(main_config: Dict[str, Any], gather_opts: dict, processq: Queue) → None[source]

__module__ = 'sierra.core.pipeline.stage3.statistics_calculator'

__weakref__: list of weak references to the object (if defined)

_calc_gather_items(run_output_root: Path, exp_name: str) → List[GatherSpec][source]

_gather_item_from_sims(exp_output_root: Path, item: GatherSpec, runs: List[Path]) → Dict[GatherSpec, List[DataFrame]][source]

_verify_exp_outputs(exp_output_root: Path) → None[source]

Verify the integrity of all runs in an experiment.

Specifically:

All runs produced all CSV files.
All runs CSV files with the same name have the same # rows and columns.
No CSV files contain NaNs.

_verify_exp_outputs_pairwise(csv_root1: Path, csv_root2: Path) → None[source]

_wait_for_memory() → None[source]

class sierra.core.pipeline.stage3.statistics_calculator.ExpStatisticsCalculator(main_config: Dict[str, Any], avg_opts: dict, batch_stat_root: Path)[source]

Generate statistics from output files for all runs within an experiment.

Important

You CANNOT use logging ANYWHERE during processing .csv files. Why ? I think because of a bug in the logging module itself. If you get unlucky enough to spawn the process which enters the __call__() method in this class while another logging statement is in progress (and is therefore holding an internal logging module lock), then the underlying fork() call will copy the lock in the acquired state. Then, when this class goes to try to log something, it deadlocks with itself.

You also can’t just create loggers with unique names, as this seems to be something like the GIL, but for the logging module. Sometimes python sucks.

Inheritance

__call__(gather_spec: GatherSpec, gathered_dfs: List[DataFrame]) → None[source]: Call self as a function.

__dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.statistics_calculator', '__doc__': "Generate statistics from output files for all runs within an experiment.\n\n .. IMPORTANT:: You *CANNOT* use logging ANYWHERE during processing .csv\n files. Why ? I *think* because of a bug in the logging module itself. If\n you get unlucky enough to spawn the process which enters the __call__()\n method in this class while another logging statement is in progress (and\n is therefore holding an internal logging module lock), then the\n underlying fork() call will copy the lock in the acquired state. Then,\n when this class goes to try to log something, it deadlocks with itself.\n\n You also can't just create loggers with unique names, as this seems to be\n something like the GIL, but for the logging module. Sometimes python\n sucks.\n ", '__init__': <function ExpStatisticsCalculator.__init__>, '__call__': <function ExpStatisticsCalculator.__call__>, '__dict__': <attribute '__dict__' of 'ExpStatisticsCalculator' objects>, '__weakref__': <attribute '__weakref__' of 'ExpStatisticsCalculator' objects>, '__annotations__': {}})

__doc__ = "Generate statistics from output files for all runs within an experiment.\n\n .. IMPORTANT:: You *CANNOT* use logging ANYWHERE during processing .csv\n files. Why ? I *think* because of a bug in the logging module itself. If\n you get unlucky enough to spawn the process which enters the __call__()\n method in this class while another logging statement is in progress (and\n is therefore holding an internal logging module lock), then the\n underlying fork() call will copy the lock in the acquired state. Then,\n when this class goes to try to log something, it deadlocks with itself.\n\n You also can't just create loggers with unique names, as this seems to be\n something like the GIL, but for the logging module. Sometimes python\n sucks.\n "

__init__(main_config: Dict[str, Any], avg_opts: dict, batch_stat_root: Path) → None[source]

__module__ = 'sierra.core.pipeline.stage3.statistics_calculator'

__weakref__: list of weak references to the object (if defined)

sierra.core.pipeline.stage3.statistics_calculator

`sierra.core.pipeline.stage3.statistics_calculator`