sierra.core.pipeline.stage3.statistics_calculator
Classes for generating statistics within and across experiments in a batch.
GatherSpec
: Data class for specifying .csv files to gather from an Experiment.BatchExpParallelCalculator
: Process Output .csv files for each experiment in the batch.ExpCSVGatherer
: Gather all Output .csv files from all runs within an experiment.ExpStatisticsCalculator
: Generate statistics from output files for all runs within an experiment.
- class sierra.core.pipeline.stage3.statistics_calculator.GatherSpec(exp_name: str, item_stem: str, imagize_csv_stem: Optional[str])[source]
Data class for specifying .csv files to gather from an Experiment.
Inheritance
- __dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.statistics_calculator', '__doc__': '\n Data class for specifying .csv files to gather from an :term:`Experiment`.\n ', '__init__': <function GatherSpec.__init__>, 'for_imagizing': <function GatherSpec.for_imagizing>, '__dict__': <attribute '__dict__' of 'GatherSpec' objects>, '__weakref__': <attribute '__weakref__' of 'GatherSpec' objects>, '__annotations__': {}})
- __doc__ = '\n Data class for specifying .csv files to gather from an :term:`Experiment`.\n '
- __module__ = 'sierra.core.pipeline.stage3.statistics_calculator'
- __weakref__
list of weak references to the object (if defined)
- class sierra.core.pipeline.stage3.statistics_calculator.BatchExpParallelCalculator(main_config: dict, cmdopts: Dict[str, Any])[source]
Process Output .csv files for each experiment in the batch.
In parallel for speed.
Inheritance
- __call__(criteria: IConcreteBatchCriteria) None [source]
Call self as a function.
- __dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.statistics_calculator', '__doc__': 'Process :term:`Output .csv` files for each experiment in the batch.\n\n In parallel for speed.\n ', '__init__': <function BatchExpParallelCalculator.__init__>, '__call__': <function BatchExpParallelCalculator.__call__>, '_execute': <function BatchExpParallelCalculator._execute>, '_gather_worker': <staticmethod object>, '_process_worker': <staticmethod object>, '__dict__': <attribute '__dict__' of 'BatchExpParallelCalculator' objects>, '__weakref__': <attribute '__weakref__' of 'BatchExpParallelCalculator' objects>, '__annotations__': {}})
- __doc__ = 'Process :term:`Output .csv` files for each experiment in the batch.\n\n In parallel for speed.\n '
- __module__ = 'sierra.core.pipeline.stage3.statistics_calculator'
- __weakref__
list of weak references to the object (if defined)
- _execute(exp_to_avg: List[Path], avg_opts: Dict[str, Union[str, int]], n_gatherers: int, n_processors: int, pool) None [source]
- class sierra.core.pipeline.stage3.statistics_calculator.ExpCSVGatherer(main_config: Dict[str, Any], gather_opts: dict, processq: Queue)[source]
Gather all Output .csv files from all runs within an experiment.
“Gathering” in this context means creating a dictionary mapping which .csv came from where, so that statistics can be generated both across and with experiments in the batch.
Inheritance
- __dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.statistics_calculator', '__doc__': 'Gather all :term:`Output .csv` files from all runs within an experiment.\n\n "Gathering" in this context means creating a dictionary mapping which .csv\n came from where, so that statistics can be generated both across and with\n experiments in the batch.\n ', '__init__': <function ExpCSVGatherer.__init__>, '__call__': <function ExpCSVGatherer.__call__>, '_calc_gather_items': <function ExpCSVGatherer._calc_gather_items>, '_gather_item_from_sims': <function ExpCSVGatherer._gather_item_from_sims>, '_wait_for_memory': <function ExpCSVGatherer._wait_for_memory>, '_verify_exp_outputs': <function ExpCSVGatherer._verify_exp_outputs>, '_verify_exp_outputs_pairwise': <function ExpCSVGatherer._verify_exp_outputs_pairwise>, '__dict__': <attribute '__dict__' of 'ExpCSVGatherer' objects>, '__weakref__': <attribute '__weakref__' of 'ExpCSVGatherer' objects>, '__annotations__': {}})
- __doc__ = 'Gather all :term:`Output .csv` files from all runs within an experiment.\n\n "Gathering" in this context means creating a dictionary mapping which .csv\n came from where, so that statistics can be generated both across and with\n experiments in the batch.\n '
- __module__ = 'sierra.core.pipeline.stage3.statistics_calculator'
- __weakref__
list of weak references to the object (if defined)
- _gather_item_from_sims(exp_output_root: Path, item: GatherSpec, runs: List[Path]) Dict[GatherSpec, List[DataFrame]] [source]
- class sierra.core.pipeline.stage3.statistics_calculator.ExpStatisticsCalculator(main_config: Dict[str, Any], avg_opts: dict, batch_stat_root: Path)[source]
Generate statistics from output files for all runs within an experiment.
Important
You CANNOT use logging ANYWHERE during processing .csv files. Why ? I think because of a bug in the logging module itself. If you get unlucky enough to spawn the process which enters the __call__() method in this class while another logging statement is in progress (and is therefore holding an internal logging module lock), then the underlying fork() call will copy the lock in the acquired state. Then, when this class goes to try to log something, it deadlocks with itself.
You also can’t just create loggers with unique names, as this seems to be something like the GIL, but for the logging module. Sometimes python sucks.
Inheritance
- __call__(gather_spec: GatherSpec, gathered_dfs: List[DataFrame]) None [source]
Call self as a function.
- __dict__ = mappingproxy({'__module__': 'sierra.core.pipeline.stage3.statistics_calculator', '__doc__': "Generate statistics from output files for all runs within an experiment.\n\n .. IMPORTANT:: You *CANNOT* use logging ANYWHERE during processing .csv\n files. Why ? I *think* because of a bug in the logging module itself. If\n you get unlucky enough to spawn the process which enters the __call__()\n method in this class while another logging statement is in progress (and\n is therefore holding an internal logging module lock), then the\n underlying fork() call will copy the lock in the acquired state. Then,\n when this class goes to try to log something, it deadlocks with itself.\n\n You also can't just create loggers with unique names, as this seems to be\n something like the GIL, but for the logging module. Sometimes python\n sucks.\n ", '__init__': <function ExpStatisticsCalculator.__init__>, '__call__': <function ExpStatisticsCalculator.__call__>, '__dict__': <attribute '__dict__' of 'ExpStatisticsCalculator' objects>, '__weakref__': <attribute '__weakref__' of 'ExpStatisticsCalculator' objects>, '__annotations__': {}})
- __doc__ = "Generate statistics from output files for all runs within an experiment.\n\n .. IMPORTANT:: You *CANNOT* use logging ANYWHERE during processing .csv\n files. Why ? I *think* because of a bug in the logging module itself. If\n you get unlucky enough to spawn the process which enters the __call__()\n method in this class while another logging statement is in progress (and\n is therefore holding an internal logging module lock), then the\n underlying fork() call will copy the lock in the acquired state. Then,\n when this class goes to try to log something, it deadlocks with itself.\n\n You also can't just create loggers with unique names, as this seems to be\n something like the GIL, but for the logging module. Sometimes python\n sucks.\n "
- __module__ = 'sierra.core.pipeline.stage3.statistics_calculator'
- __weakref__
list of weak references to the object (if defined)