Pipeline Schema

This module is responsible for managing the “pipeline schemas”, i.e. the allowable and fully supported pipelines.

class easylink.pipeline_schema.PipelineSchema(name, nodes, edges)[source]

Bases: HierarchicalStep

All possible pipelines that are fully supported.

A PipelineSchema is a HierarchicalStep whose StepGraph determines all possible allowable pipelines. The fundamental purpose of this class is to validate that the user-requested pipeline to run conforms to a fully supported pipeline.

See HierarchicalStep for inherited attributes.

Parameters:
  • name (str) – The name of the pipeline schema.

  • nodes (Iterable[Step]) – The nodes of the pipeline schema.

  • edges (Iterable[EdgeParams]) – The edges of the pipeline schema.

Notes

A PipelineSchema is intended to be constructed by the get_schema() class method.

The PipelineSchema is a high-level abstraction; it represents the desired pipeline of conceptual steps to run with no detail as to how each of those steps is implemented.

get_implementation_graph()[source]

Gets the ImplementationGraph.

The PipelineSchema is by definition a HierarchicalStep which has a StepGraph containing sub-Steps that need to be unrolled. This method recursively traverses that StepGraph and its childrens’ StepGraphs until all sub-Steps are in a LeafConfigurationState, i.e. all Steps are implemented by a single Implementation and we have the desired ImplementationGraph.

Return type:

ImplementationGraph

Returns:

The ImplementationGraph of this PipelineSchema.

validate_step(pipeline_config, input_data_config)[source]

Validates the pipeline configuration against this PipelineSchema.

Return type:

dict[str, list[str]]

Parameters:
Returns:

A dictionary of errors, where the keys are the names of any steps that did not validate and the values are lists of as many error messages as could be generated for each of those steps.

Notes

Below, we nest the full pipeline configuration under a “substeps” key of a root HierarchicalStep because such a root step doesn’t exist from the user’s perspective and doesn’t appear explicitly in the user-provided pipeline specification file.

validate_inputs(input_data)[source]

Validates the file’s existence and properties for each file slot.

Return type:

dict[str, list[str]]

Parameters:

input_data (dict[str, Path]) – A dictionary mapping input data slot names to file paths.

Returns:

A dictionary of errors, where the keys are the names of any files that did not validate and the values are lists of as many error messages as could be generated for each of those files.

configure_pipeline(pipeline_config, input_data_config)[source]

Configures the PipelineSchema and corresponding StepGraphs<easylink.graph_components.StepGraph.

The configuration state of any Step tells whether that Step is a leaf or a non-leaf node and is assigned to the easylink.step.Step.configuration_state. By definition, the entire PipelineSchema has non-leaf configuration state; this method thus assigns a NonLeafConfigurationState to the PipelineSchema. Upon instantiation, this NonLeafConfigurationState recursively updates the StepGraphs until all non-leaf nodes are resolved.

Return type:

None

Parameters:
classmethod get_schema(name='main')[source]

Gets the requested PipelineSchema.

This PipelineSchema represents the fully supported pipelines and is used to validate the user-requested pipeline.

Return type:

PipelineSchema

Parameters:

name (str) – The name of the PipelineSchema to get.

Returns:

The requested PipelineSchema.