Storage (--storage)#

Storage plugins tell SIERRA how to handle file I/O in stages 3-5. Specifically:

Each plugin can support any number of input formats, identified by file extensions, and any number of output types. This is summarized below for the storage plugins which come with SIERRA; additional formats can be supported via New Storage Plugin (--storage).

Plugin

Supported input formats

Allowed file extensions

Output type

CSV

CSV

.csv

pd.DataFrame

Apache Arrow

Apache arrow

.arrow

pd.DataFrame

GraphML

GraphML

.graphml

nx.Graph

Other plugins in stages 3-5 may require a specific output format; see individual docs for details.

Tip

If you are New Storage Plugin (--storage), follow the Unix philosophy of doing one thing well, and make multiple smaller plugins, rather than 1 storage plugin which handles all of your custom types/formats.

CSV#

Select the CSV format for all data I/O in stages 3-5. This storage plugin can be selected via --storage=storage.csv. This is the default storage type which SIERRA will use if none is specified on the cmdline.

Since this plugin produces pd.DataFrame objects, it is suitable for processing numeric data.

Changed in version 1.3.28: The CSV files read by this plugin must be comma (,) separated. Previously it was semicolon (;) separated.

Apache Arrow#

Select the arrow format for all data I/O in stages 3-5. This storage plugin can be selected via --storage=storage.arrow.

Since this plugin produces pd.DataFrame objects, it is suitable for processing numeric data.

GraphML#

Select the GraphML format for all data I/O in stages 3-5. This storage plugin can be selected via --storage=storage.graphml.

Since this plugin produces nx.Graph objects, it is not suitable for processing numeric data. E.g., running the Statistics Generation plugin with this plugin selected will cause an error.