Design Philosophy#
This document outlines the core ideas behind SIERRA's design. Understanding these helps explain why SIERRA works the way it does — including choices that might initially seem overly rigid or opinionated.
Single Input, Multiple Output#
During stage 1, SIERRA generates multiple experiments from a single template
input file specified on the command line. It does not follow any
references or includes within that file, which simplifies generation and
improves reproducibility (for example, it avoids subtle errors caused by a ROS
<include> resolving differently across systems). SIERRA does support
flattening an input file tree into a single file; see Experiment Definition (--expdef).
Low Floor, No Ceiling#
SIERRA is designed so that things work as much out-of-the-box as possible (low floor), while not compromising configurability/extensibility for more advanced users (no ceiling).
Decision |
Rationale |
Supports |
|---|---|---|
Don't modify user directory structures |
Experimental Runs produce their own directory structure — flat or deeply nested. SIERRA preserves that structure during stages {3,4}, following the Principle of Least Surprise and making it easy to pipe SIERRA outputs into existing scripts with minimal changes. |
Low floor |
Wrap engine CLIs rather than reimplementing them |
For Engine (--engine) and Execution Environment (--execenv), SIERRA translates users' original invocation commands rather than reimplementing engine APIs. All engines and execution environments have a CLI; not all have Python bindings. This choice keeps integration predictable. |
Low floor |
Maximally configurable |
SIERRA exposes relevant settings as configuration wherever possible, even if most users will never change the defaults. This ensures nothing is hardwired that a sufficiently advanced project might need to control. |
No ceiling |
Maximize reusability |
When used properly, you should never need to copy-paste YAML configuration or Python code between projects, engines, or scenarios. The upfront configuration investment pays off at scale. |
No ceiling |
Separate processing from product generation |
Stage 3 post-processes raw run outputs into normalised, statistics-bearing files. Stage 4 reads only those files to generate graphs and deliverables. This means you can re-run stage 4 with different graph configuration (title, axis range, additional lines) in seconds, without re-running experiments or reprocessing data. Each stage 4 deliverable has a single, well-defined input source. |
Low floor, No ceiling |
Internal implementation conventions (relevant to contributors):
Convention |
Rationale |
|---|---|
Assert often, fail early |
If SIERRA encounters a condition it cannot handle, it aborts via an
uncaught exception or |
Never Delete Things#
Experimental data is hard-won. SIERRA therefore refuses to delete or overwrite anything in stages {1,2} without explicit permission, because losing those outputs in a later stage would be irreversible. Files generated in stages {3,4,5} are derived from stage {1,2} outputs and can be safely regenerated, so SIERRA will overwrite them freely. To override the protection on stages {1,2}, pass --exp-overwrite.
Swiss Army Pipeline#
SIERRA's 5-stage pipeline is designed to be run in any subset. You should be able to re-run only stage 4 after tweaking a graph config, or only stages {3,4} after a fresh post-processing pass, without friction.
This is achieved through several structural choices:
Stage 3 (processing) and stage 4 (product generation) are kept separate, so re-generating products never requires re-processing raw data.
Stage 4 products are each sourced from a single input file, not assembled from multiple files at generation time.
Each pipeline stage is transactional at the file level: it reads from and writes to files on disk, rather than keeping state in memory. This makes arbitrary stage subsets composable.
The Runtime Directory Tree uses human-readable, non-hashed directory names, so researchers can inspect, copy, or hand off data at any stage without needing SIERRA to interpret it.
Separation of Data Types#
Statistics generated during stage 3 are stored in separate files from the underlying data, even when the chosen --storage or --prod plugin could accommodate them in a single file. The reasons are:
Readability. For 2D and higher-dimensional data, separating statistics from raw values makes both files easier to inspect.
Memory footprint. If a user is generating a 2D heatmap, any standard deviation columns in the source file are irrelevant and would waste memory unnecessarily if co-located with the data.
Logo Design Rationale#
The logo is actually well-thought out/not something random which "looked cool".
Core Concept: "Research Compiler"
SIERRA turns research queries into executable experiments and reproducible outputs. The logo represents a structured system that transforms research inputs into deterministic results. It encodes the transformation pipeline:
Research Inputs -> SIERRA Compiler -> Structured Experiments
(nodes) (segmented system) (grid)
Or more simply:
Research Intent -> Automated Experiments -> Reproducible Results
And communicates (hopefully):
serious research tooling
automation infrastructure
deterministic pipelines
modular architecture
reproducibility
Circular Frame -> Execution Environment
The outer segmented circle represents the controlled framework environment.
Meaning:
Encapsulation of the research pipeline
Deterministic system boundaries
Reproducible execution
The segmentation hints at:
pipeline stages
modular plugin architecture
execution phases
It suggests ordered computation happening inside a system.
Arc Segments -> Pipeline Stages
The curved arcs represent progressive transformation through the pipeline stages. The arcs imply motion and flow, but within a controlled system rather than a loose pipeline. This reinforces:
deterministic automation
reproducible research workflows
Grid of Squares -> Structured Outputs
The central grid represents compiled experimental artifacts. Interpretations include:
parameter sweep results
experiment matrices
structured datasets
reproducible experiment outputs
The squares are uniform, aligned, and deterministic. They contrast with incoming nodes (inputs). Visually this communicates:
unstructured ideas -> structured results
Node Dots -> Research Inputs
The dots on the left represent incoming research queries or parameters. They symbolize:
experiment parameters
datasets
configuration inputs
plugin modules
Different sizes suggest different input types and expanding parameter sweeps. The dots converge toward the structured grid.
Directional Flow -> Compilation
The overall layout subtly moves left to right:
inputs -> compilation -> results
Nodes appear on the left and structured outputs appear on the right.
This visually encodes the concept:
Research idea -> SIERRA compilation -> Reproducible experiments
Blue Color Palette -> Engineering and Research
The color scheme reinforces the technical positioning.
Blue suggests:
scientific rigor
trust
infrastructure software
engineering systems
Gradients subtly suggest transformation and computation.