Pipeline Schema
This module is responsible for managing the “pipeline schemas”, i.e. the allowable and fully supported pipelines.
- class easylink.pipeline_schema.PipelineSchema(name, nodes, edges)[source]
Bases:
HierarchicalStepAll possible pipelines that are fully supported.
A
PipelineSchemais aHierarchicalStepwhoseStepGraphdetermines all possible allowable pipelines. The fundamental purpose of this class is to validate that the user-requested pipeline to run conforms to a fully supported pipeline.See
HierarchicalStepfor inherited attributes.- Parameters:
name (str) – The name of the pipeline schema.
edges (Iterable[EdgeParams]) – The edges of the pipeline schema.
Notes
A
PipelineSchemais intended to be constructed by theget_schema()class method.The
PipelineSchemais a high-level abstraction; it represents the desired pipeline of conceptual steps to run with no detail as to how each of those steps is implemented.- get_implementation_graph()[source]
Gets the
ImplementationGraph.The
PipelineSchemais by definition aHierarchicalStepwhich has aStepGraphcontaining sub-Stepsthat need to be unrolled. This method recursively traverses thatStepGraphand its childrens’StepGraphsuntil all sub-Stepsare in aLeafConfigurationState, i.e. allStepsare implemented by a singleImplementationand we have the desiredImplementationGraph.- Return type:
- Returns:
The
ImplementationGraphof thisPipelineSchema.
- validate_step(pipeline_config, input_data_config)[source]
Validates the pipeline configuration against this
PipelineSchema.- Return type:
- Parameters:
pipeline_config (LayeredConfigTree) – The pipeline configuration to validate.
input_data_config (LayeredConfigTree) – The input data configuration.
- Returns:
A dictionary of errors, where the keys are the names of any steps that did not validate and the values are lists of as many error messages as could be generated for each of those steps.
Notes
Below, we nest the full pipeline configuration under a “substeps” key of a root
HierarchicalStepbecause such a root step doesn’t exist from the user’s perspective and doesn’t appear explicitly in the user-provided pipeline specification file.
- validate_inputs(input_data)[source]
Validates the file’s existence and properties for each file slot.
- Return type:
- Parameters:
input_data (dict[str, Path]) – A dictionary mapping input data slot names to file paths.
- Returns:
A dictionary of errors, where the keys are the names of any files that did not validate and the values are lists of as many error messages as could be generated for each of those files.
- configure_pipeline(pipeline_config, input_data_config)[source]
Configures the
PipelineSchemaand corresponding StepGraphs<easylink.graph_components.StepGraph.The configuration state of any
Steptells whether thatStepis a leaf or a non-leaf node and is assigned to theeasylink.step.Step.configuration_state. By definition, the entirePipelineSchemahas non-leaf configuration state; this method thus assigns aNonLeafConfigurationStateto thePipelineSchema. Upon instantiation, thisNonLeafConfigurationStaterecursively updates theStepGraphsuntil all non-leaf nodes are resolved.- Return type:
- Parameters:
pipeline_config (LayeredConfigTree) – The pipeline configuration.
input_data_config (LayeredConfigTree) – The input data configuration.
- classmethod get_schema(name='main')[source]
Gets the requested
PipelineSchema.This
PipelineSchemarepresents the fully supported pipelines and is used to validate the user-requested pipeline.- Return type:
- Parameters:
name (str) – The name of the
PipelineSchemato get.- Returns:
The requested
PipelineSchema.