Data Compression#

When dealing with Projects which produce huge amounts of data, it is easy to blow out allocated storage with uncompressed data if you run lots of Batch Experiments. Thus, it is often useful to compress data for such projects; that's where this plugin comes in. Keep in mind that this plugin runs during stage 3, so if you generate so much data during stage 2 so as to blow out your disk, this plugin can't help. However, you can look at IExpRunShellCmdsGenerator and add whatever commands needed after each run to compress the data if you generate ungodly amounts of data.

This plugin processes at the file level for each Experimental Run. The entire output tree is compressed to a .tar.gz file. Optionally, the uncompressed data can be removed after compression with --compress-remove-after. No data is lost--it's all in the archive!

Ordering Considerations#

Statistics Generation and/or Intra-Experiment Data Collation should proceed this plugin in the --proc chain if you want processed outputs to be included in the archive in addition to raw outputs.

Usage#

This plugin can be selected by adding proc.compress to the list passed to --proc.

Cmdline Interface#

sierra - CLI interface#

sierra [--compress-remove-after]

sierra Stage 3 options#

Options for processing experiment results

--compress-remove-after -
If the proc.compress plugin is run, remove the uncompressed Raw Output Data files after compression. This can save TONS of disk space. No data is lost because everything output by each Experimental Run is in the compressed archive.