Steps
This module is responsible for defining the abstractions that represent desired steps to run in a pipeline. These so-called “steps” are high level and do not indicate how they are to actually be implemented.
- class easylink.step.Step(step_name, name=None, input_slots=(), output_slots=(), input_slot_mappings=(), output_slot_mappings=(), is_auto_parallel=False, default_implementation=None)[source]
Bases:
objectThe highest-level pipeline building block abstraction.
Stepscontain information about the purpose of the interoperable tasks in the sequence called a “pipeline” and how those tasks relate to one another. In turn,Stepsare implemented byImplementations, such that eachStepmay have severalImplementationsto choose from but eachImplementationmust implemement exactly oneStep. As such, the pipeline for a given EasyLink run consists ofImplementationsthat collectively span theStepsin thePipelineSchema.- Parameters:
step_name (str | None) – The name of the pipeline step in the
PipelineSchema. It must also match the key in the implementation metadata file to be used to run thisStep.name (str | None) – The name of this
Step'snode in itseasylink.graph_components.StepGraph. This can be different from thestep_namedue to the need for disambiguation during the process of flattening theStepgraph, e.g. unrolling loops, etc. For example, if step 1 is looped multiple times, each node would have astep_nameof, perhaps, “step_1” but uniquenames(“step_1_loop_1”, etc).input_slots (Iterable[InputSlot]) – All required
InputSlots.output_slots (Iterable[OutputSlot]) – All required
OutputSlots.input_slot_mappings (Iterable[InputSlotMapping]) – The
InputSlotMappingof thisStep.output_slot_mappings (Iterable[OutputSlotMapping]) – The
OutputSlotMappingof thisStep.is_auto_parallel (bool) – Whether or not this
Stepis to automatically run in parallel.default_implementation (str | None)
Notes
This is the most basic type of step object available in the pipeline; it represents a single element of work to be run one time in the pipeline. Other classes inherit from this and expand upon it to represent more complex structures, e.g. to loop a step multiple times or to run multiple steps in parallel.
- step_name
The name of the pipeline step in the
PipelineSchema. It must also match the key in the implementation metadata file to be used to run thisStep.
- _name
The name of this
Step'snode in itseasylink.graph_components.StepGraph. This can be different from thestep_namedue to the need for disambiguation during the process of flattening theStepgraph, e.g. unrolling loops, etc. For example, if step 1 is looped multiple times, each node would have astep_nameof, perhaps, “step_1” but uniquenames(“step_1_loop_1”, etc).
- input_slots
A mapping of
InputSlotnames to their instances.
- output_slots
A mapping of
OutputSlotnames to their instances.
- slot_mappings
A combined dictionary containing both the
InputSlotMappingsandOutputSlotMappingsof thisStep.
- is_auto_parallel
Whether or not this
Stepis to be automatically run in parallel.
- default_implementation
The default implementation to use for this
Stepif theStepis not explicitly configured in the pipeline specification.
- parent_step
This
Step'sparentStep, if applicable.
- _configuration_state
This
Step'sConfigurationState.
- property name
The name of this
Step'snode in itseasylink.graph_components.StepGraph. This can be different from thestep_namedue to the need for disambiguation during the process of flattening theStepgraph, e.g. unrolling loops, etc. For example, if step 1 is looped multiple times, each node would have astep_nameof, perhaps, “step_1” but uniquenames(“step_1_loop_1”, etc).
- property config_key
The configuration key pertinent to this type of
Step.
- property configuration_state: ConfigurationState
The
ConfigurationStateof thisStep.
- property implementation_node_name: str
The unique name to be used for this
Step'snode in theImplementationGraph.This compares the
Stepinstance name to its node name via theStep'sordered hierarchy of sub-Stepsand uses the full suffix of names starting from wherever the two first differ.For example, a
Stepnamed “step_3” may loop multiple times using the sameImplementationnamed “step_3_python_pandas”. However, to disambiguate between the different loops of “step_3”, we might designate the node name to be “step_3_loop_1” and then combine that with theImplementationname such that theImplementation'snode name is “step_3_loop_1_step_3_python_pandas”.If all the node names and step names match, we have not introduced any step degeneracies (with e.g. loops or multiples), and we can simply use the implementation name directly.
- Return type:
The unique name to be used for this
Step'snode in theImplementationGraph.
- validate_step(step_config, combined_implementations, input_data_config)[source]
Validates the
Step.- Return type:
- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- Returns:
A dictionary of errors, where the keys are the
Stepname and the values are lists of error messages associated with the givenStep.
Notes
If the
Stepdoes not validate (i.e. errors are found and the returned dictionary is non-empty), the tool will exit and the pipeline will not run.We attempt to batch error messages as much as possible, but there may be times where the configuration is so ill-formed that we are unable to handle all issues in one pass. In these cases, new errors may be found after the initial ones are handled.
- add_nodes_to_implementation_graph(implementation_graph)[source]
Adds the
Implementationsrelated to thisStepas nodes to theImplementationGraph.How the nodes get added depends on whether this
Stepis a leaf or a non-leaf, i.e. what itsconfiguration_stateis.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- add_edges_to_implementation_graph(implementation_graph)[source]
Adds the edges of this
Step'sImplementation(s)to theImplementationGraph.How the edges get added depends on whether this
Stepis a leaf or a non-leaf, i.e. what itsconfiguration_stateis.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- get_implementation_edges(edge)[source]
Gets the edge information for the
Implementationrelated to thisStep.- Return type:
- Parameters:
edge (EdgeParams) – The
Step'sedge information to be propagated to theImplementationGraph.- Returns:
The
Implementation'sedge information based on thisStep'sconfiguration state.
- set_parent_step(step)[source]
Sets the parent of this
Step.- Return type:
- Parameters:
step (Step) – The parent
Stepto be set for this instance’sparent_step.
- set_configuration_state(step_config, combined_implementations, input_data_config)[source]
Sets the configuration state to ‘leaf’.
- Return type:
- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- get_implementation_slot_mappings()[source]
Gets the input and output
SlotMappings.- Return type:
dict[str,list[SlotMapping]]
- class easylink.step.StandaloneStep(step_name, name=None, input_slots=(), output_slots=(), input_slot_mappings=(), output_slot_mappings=(), is_auto_parallel=False, default_implementation=None)[source]
-
A special case type of
Stepthat is not implemented on the pipeline.These are not typical
Stepsin that they do not represent a unit of work to be performed in the pipeline (i.e. there is no container to run) and, thus, are not implemented by anImplementation.See
Stepfor inherited attributes.- Parameters:
step_name (str | None)
name (str | None)
input_slots (Iterable[InputSlot])
output_slots (Iterable[OutputSlot])
input_slot_mappings (Iterable[InputSlotMapping])
output_slot_mappings (Iterable[OutputSlotMapping])
is_auto_parallel (bool)
default_implementation (str | None)
- property implementation_node_name: str
Dummy name to allow
StandaloneStepsto be used interchangeably with otherSteps.Unlike other types of
Steps,StandaloneStepsare not actually implemented via anImplementationand thus do not require a different node name than its ownStepname. This property only exists so thatStandaloneStepscan be used interchangeably with otherStepsin the codebase.- Return type:
The
StandaloneStep'sname.
- abstract add_nodes_to_implementation_graph(implementation_graph)[source]
Adds this
StandaloneStep'sImplementationas a node to theImplementationGraph.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
Notes
Unlike other types of
Steps,StandaloneStepsare not actually implemented via anImplementation. As such, we leverage theNullImplementationclass to generate the graph node.
- validate_step(step_config, combined_implementations, input_data_config)[source]
Dummy validation method to allow
StandaloneStepsto be used interchangeably with otherSteps.Unlike other types of
Steps,StandaloneStepsare not actually implemented via anImplementationand thus do not require any sort of validation since no new data is created. This method only exists so thatStandaloneStepscan be used interchangeably with otherStepsin the codebase.- Return type:
- Returns:
An empty dictionary.
- Parameters:
step_config (LayeredConfigTree)
combined_implementations (LayeredConfigTree)
input_data_config (LayeredConfigTree)
- set_configuration_state(step_config, combined_implementations, input_data_config)[source]
Sets the configuration state to ‘leaf’.
- Return type:
- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any
Implementationsto be combined.input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- add_edges_to_implementation_graph(implementation_graph)[source]
Overwrites the super
Step’s method to do nothing.StandaloneStepsdo not have edges within them in theImplementationGraph, since they are represented by a singleNullImplementationnode, and so we simply pass.
- _abc_impl = <_abc._abc_data object>
- class easylink.step.IOStep(step_name, name=None, input_slots=(), output_slots=(), input_slot_mappings=(), output_slot_mappings=(), is_auto_parallel=False, default_implementation=None)[source]
Bases:
StandaloneStepA type of
StandaloneStepused to represent incoming and outgoing data.IOStepsare used to handle the incoming and outgoing data to the pipeline; they are inherited by concreteInputStepandOutputStepclasses.See
Stepfor inherited attributes.- Parameters:
step_name (str | None)
name (str | None)
input_slots (Iterable[InputSlot])
output_slots (Iterable[OutputSlot])
input_slot_mappings (Iterable[InputSlotMapping])
output_slot_mappings (Iterable[OutputSlotMapping])
is_auto_parallel (bool)
default_implementation (str | None)
- add_nodes_to_implementation_graph(implementation_graph)[source]
Adds a
NullImplementationnode to theImplementationGraph.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- _abc_impl = <_abc._abc_data object>
- class easylink.step.InputStep[source]
Bases:
IOStepA special case type of
IOStepused to represent incoming data.An
InputStepis used to pass data into the pipeline. Since we do not know what the data to pass into the pipeline will be a priori, we instantiate an “all”OutputSlotwhich is used to pass in all data defined in the input data specification file.See
IOStepfor inherited attributes.- set_configuration_state(step_config, combined_implementations, input_data_config)[source]
Sets the configuration state and updates the
OutputSlots.In addition to setting
InputStepto a ‘leaf’ configuration state, this method also updates theOutputSlotsto include all of the dataset keys in the input data specification file. This allows for future use of specific datasets instead of only all of them.- Return type:
- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- _abc_impl = <_abc._abc_data object>
- class easylink.step.OutputStep(input_slots)[source]
Bases:
IOStepA special case type of
IOStepused to represent final results data.An
OutputStepis used to write the Snakemake Snakefile target rule in theeasylink.pipeline.Pipeline.build_snakefile()method.See
IOStepfor inherited attributes.- Parameters:
input_slots (Iterable[InputSlot])
- _abc_impl = <_abc._abc_data object>
- class easylink.step.HierarchicalStep(step_name, name=None, input_slots=(), output_slots=(), nodes=(), edges=(), input_slot_mappings=(), output_slot_mappings=(), directly_implemented=True, default_implementation=None)[source]
Bases:
StepA type of
Stepthat can may contain sub-Steps.A
HierarchicalStepcan be represented by multiple sub-Steps(and thus implemented by the sub-Steps'respectiveImplementations. For example, “step_1” might be represented by a “step_1a” and a “step_1b”, each of which has its ownImplementation.See
Stepfor inherited attributes.- Parameters:
nodes – All sub-nodes (i.e. sub-
Steps) that make up thisHierarchicalStep.edges – The
EdgeParamsof the sub-nodes.step_graph – The
StepGraphi.e. the directed acyclic graph (DAG) of sub-nodes and their edges that make up thisHierarchicalStep.directly_implemented – Whether or not the
HierarchicalStepis implemented directly from the user. It is a convenience attribute to allow for back-endHierarchicalStepconstruction (i.e. ones that do not have a corresponding user-provided ‘substeps’ configuration key).default_implementation (str | None)
- nodes
All sub-nodes (i.e. sub-
Steps) that make up thisHierarchicalStep.
- edges
The
EdgeParamsof the sub-nodes.
- step_graph
The
StepGraphi.e. the directed acyclic graph (DAG) of sub-nodes and their edges that make up thisHierarchicalStep.
- directly_implemented
Whether or not the
HierarchicalStepis user-configurable. It is a convenience attribute to allow for back-endHierarchicalStepcreation that are not user-facing (i.e. they do not need to provide a ‘substeps’ configuration key).
- property config_key
The pipeline specification key required for a
HierarchicalStep.
- validate_step(step_config, combined_implementations, input_data_config)[source]
Validates the
HierarchicalStep.- Return type:
- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- Returns:
A dictionary of errors, where the keys are the
HierarchicalStepname and the values are lists of error messages associated with the givenHierarchicalStep.
Notes
A
HierarchicalStepcan be in either a “leaf” or a “non-leaf” configuration state and the validation process is different for each.If the
HierarchicalStepdoes not validate (i.e. errors are found and the returned dictionary is non-empty), the tool will exit and the pipeline will not run.We attempt to batch error messages as much as possible, but there may be times where the configuration is so ill-formed that we are unable to handle all issues in one pass. In these cases, new errors may be found after the initial ones are handled.
- set_configuration_state(step_config, combined_implementations, input_data_config)[source]
Sets the configuration state.
The configuration state of a
HierarchicalStepdepends on (1) whether or not it isdirectly_implementedand (2) whether or not theconfig_keyexists in the pipeline specification file.- Return type:
- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- _get_step_graph(nodes, edges)[source]
Creates a
StepGraphfrom the nodes and edges the step was initialized with.- Return type:
- Parameters:
edges (list[EdgeParams])
- _validate_step_graph(step_config, combined_implementations, input_data_config)[source]
Validates the nodes of a
StepGraph.- Return type:
- Parameters:
step_config (LayeredConfigTree)
combined_implementations (LayeredConfigTree)
input_data_config (LayeredConfigTree)
- _check_edges_are_valid()[source]
Check that edges are valid, i.e. each connect two slots that actually exist.
- _check_slot_mappings_are_valid()[source]
Check that input and output slot mappings are valid.
Checks that the input and output slots on the parent step are all mapped, and that all slot mappings connect a slot on self (the parent) that actually exists to an slot that actually exists on a sub-step.
- _check_validators_are_consistent()[source]
Check that if two input slots will receive the same data, they have the same validator.
There are two versions of this to check: input slots that receive the same data because one is mapped to the other by a slot mapping, and input slots that receive the same data because they both are at the receiving end of edges from the same output slot.
- class easylink.step.TemplatedStep(template_step, default_implementation=None)[source]
-
A type of
Stepthat may contain multiplicity.A
TemplatedStepis used to represents aStepthat contains a specified amount of multiplicity, such as one that is looped or run in parallel; it is inherited by concreteLoopStepandCloneableStepinstances.See
Stepfor inherited attributes.- step_graph
The
StepGraphi.e. the directed acyclic graph (DAG) of sub-nodes and their edges that make up thisTemplatedStep.
- template_step
The
Stepto be templated.
- abstract property node_prefix: str
The prefix to be used in the node name.
To disambiguate between the different types of nodes with multiplicity (i.e. loops or parallel), we use a unique prefix to be used as necessary.
- Return type:
The prefix to be used for the concrete
TemplatedStepinstances.
- abstract _update_step_graph(num_repeats)[source]
Updates the
StepGraph.The
TemplatedStepconcrete instances must handle the fact that there is multiplicity in theStepGraphand update it accordingly.- Return type:
- Parameters:
num_repeats (int) – The number of copies to be made of the
TemplatedStep.- Returns:
The updated
StepGraphwith unrolledSteps.
Notes
We do not know a priori - or even during instantiation of the
PipelineSchema- how many copies of anyTemplatedStepsto make; indeed, there may be noTemplatedStepsat all. The user-provided pipeline configuration file must be read in in order to determine the number of multiples to generate.
- abstract _update_slot_mappings(num_repeats)[source]
Updates the
SlotMappings.- Return type:
dict[str,list[SlotMapping]]- Parameters:
num_repeats (int) – The number of copies to be made of the
TemplatedStep.- Returns:
Updated
SlotMappingsthat account for theTemplatedStepmultiplicity.
- validate_step(step_config, combined_implementations, input_data_config)[source]
Validates the
TemplatedStep.Regardless of whether or not a
Step.config_keyis set, we always validate the baseStepused to create theTemplatedStep. If aconfig_keyis indeed set (that is, there is some multiplicity), we complete additional validations.- Return type:
- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- Returns:
A dictionary of errors, where the keys are the
TemplatedStepname and the values are lists of error messages associated with the givenTemplatedStep.
Notes
If the
TemplatedStepdoes not validate (i.e. errors are found and the returned dictionary is non-empty), the tool will exit and the pipeline will not run.We attempt to batch error messages as much as possible, but there may be times where the configuration is so ill-formed that we are unable to handle all issues in one pass. In these cases, new errors may be found after the initial ones are handled.
- set_configuration_state(step_config, combined_implementations, input_data_config)[source]
Sets the configuration state to ‘non-leaf’.
In addition to setting the configuration state, this also updates the
StepGraphandSlotMappings.- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
Notes
A
TemplatedStepis always assigned aNonLeafConfigurationStateeven if it has no multiplicity since (despite having no copies to make) we still need to traverse the sub-Stepsto get to the one with a singleImplementation, i.e. the one with aLeafConfigurationState.
- _get_config(step_config)[source]
Convenience method to get the
TemplatedStep'sconfiguration.TemplatedStepsmay include multiplicity. In such cases, their configurations must be modified to include the expandedSteps.- Return type:
- Parameters:
step_config (LayeredConfigTree) – The high-level configuration of this
TemplatedStep.- Returns:
The expanded sub-configuration of this
TemplatedStepbased on theStep.config_keyand expanded to include all looped or parallelized sub-Steps).
- _duplicate_template_step()[source]
Makes a duplicate of the template
Step.- Return type:
- Returns:
A duplicate of the
template_step.
Notes
A naive deepcopy would also make a copy of the
Step.parent_step; we don’t want this to be pointing to a copy of self, but rather to the original. We thus re-set theStep.parent_stepto the original (self) after making the copy.
- _abc_impl = <_abc._abc_data object>
- class easylink.step.LoopStep(template_step=None, self_edges=(), default_implementation=None)[source]
Bases:
TemplatedStepA type of
TemplatedStepthat allows for looping.A
LoopStepallows a user to loop a singleStepor a sequence ofStepsmultiple times such that each iteration depends on the previous.See :class:
TemplatedStepfor inherited attributes.- Parameters:
template_step (Step | None) – The
Stepto be templated.self_edges (Iterable[EdgeParams]) –
EdgeParamsthat represent self-edges, i.e. edges that connect the output of one loop to the input of the next.default_implementation (str | None)
- self_edges
EdgeParamsthat represent self-edges, i.e. edges that connect the output of one loop to the input of the next.
- property config_key
The pipeline specification key required for a
LoopStep.
- property node_prefix
The prefix to be used in the
LoopStepnode name.
- _update_step_graph(num_repeats)[source]
Updates the
StepGraphto include loops.This makes
num_repeatscopies of theTemplatedStepand chains them together sequentially according to the self edges.
- _update_slot_mappings(num_repeats)[source]
Updates the
SlotMappings.This updates the appropriate slot mappings based on the number of loops and the non-self-edge input and output slots.
- Return type:
dict[str,list[SlotMapping]]- Parameters:
num_repeats – The number of loops.
- Returns:
Updated
SlotMappingsthat account for the number of loops requested.
- _abc_impl = <_abc._abc_data object>
- class easylink.step.CloneableStep(template_step, default_implementation=None)[source]
Bases:
TemplatedStepA type of
TemplatedStepthat creates multiple copies in parallel with no dependencies between them.See
TemplatedStepfor inherited attributes.- property config_key
The pipeline specification key required for a
CloneableStep.
- property node_prefix
The prefix to be used in the
CloneableStepnode name.
- _update_step_graph(num_repeats)[source]
Updates the
StepGraphto include parallelization.This makes
num_repeatscopies of theTemplatedStepthat are independent but contain the same edges.
- _update_slot_mappings(num_repeats)[source]
Updates the
SlotMappings.This updates the appropriate slot mappings based on the number of parallel copies and the existing input and output slots.
- Return type:
dict[str,list[SlotMapping]]- Parameters:
num_repeats (int) – The number of parallel copies.
- Returns:
Updated
SlotMappingsthat account for the number of copies requested.
- _abc_impl = <_abc._abc_data object>
- class easylink.step.AutoParallelStep(step, slot_splitter_mapping, slot_aggregator_mapping)[source]
Bases:
StepA
Stepthat is run in parallel on the backend.An
AutoParallelStepis different than aCloneableStepin that it is not configured by the user to be run in parallel - it completely happens on the back end for performance reasons.See
Stepfor inherited attributes.- Parameters:
step (Step) – The
Stepto be automatically run in parallel. To run multiple steps in parallel, use aHierarchicalStep.slot_splitter_mapping (dict[str, Callable]) – A mapping of the
InputSlotname to split to the actual splitter function to be used.slot_aggregator_mapping (dict[str, Callable]) – A mapping of all
OutputSlotnames to be aggregated and the actual aggregator function to be used.
- slot_splitter_mapping
A mapping of the
InputSlotname to split to the actual splitter function to be used.
- slot_aggregator_mapping
A mapping of all
OutputSlotnames to be aggregated and the actual aggregator function to be used.
- split_slot_name
The name of the
InputSlotto be split.
- property name
The name of this
Step'snode in itseasylink.graph_components.StepGraph. This can be different from thestep_namedue to the need for disambiguation during the process of flattening theStepgraph, e.g. unrolling loops, etc. For example, if step 1 is looped multiple times, each node would have astep_nameof, perhaps, “step_1” but uniquenames(“step_1_loop_1”, etc).
- _validate()[source]
Validates the
AutoParallelStep.AutoParallelStepsare not configured by the user to be run in parallel. Since it happens on the back end, we need to do somewhat unique validations during construction. Specifically, - one and only oneInputSlotmust be mapped to a splitter method. - allOutputSlotsmust be mapped to aggregator methods.- Return type:
- validate_step(step_config, combined_implementations, input_data_config)[source]
Validates the
TemplatedStep.Regardless of whether or not a
Step.config_keyis set, we always validate the baseStepused to create theTemplatedStep. If aconfig_keyis indeed set (that is, there is some multiplicity), we complete additional validations.- Return type:
- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- Returns:
A dictionary of errors, where the keys are the
TemplatedStepname and the values are lists of error messages associated with the givenTemplatedStep.
Notes
If the
TemplatedStepdoes not validate (i.e. errors are found and the returned dictionary is non-empty), the tool will exit and the pipeline will not run.We attempt to batch error messages as much as possible, but there may be times where the configuration is so ill-formed that we are unable to handle all issues in one pass. In these cases, new errors may be found after the initial ones are handled.
- set_configuration_state(step_config, combined_implementations, input_data_config)[source]
Sets the configuration state to ‘non-leaf’.
In addition to setting the configuration state, this also updates the
StepGraphandSlotMappings.- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- _update_step_graph(splitter_step, aggregator_step)[source]
Updates the
StepGraphto include the splitting and aggregating nodes.This strings exactly three nodes together: the
SplitterStepthat does the splitting of the input data, the actualStepto be run in parallel, and theAggregatorStepthat aggregates the output data, i.e.SplitterStep -> ``Step-> AggregatorStep``.- Return type:
- Parameters:
splitter_step (SplitterStep)
aggregator_step (AggregatorStep)
Notes
The
SplitterStepandAggregatorStepare backed by versions ofNullImplementations, i.e. they do not actually require containers to run.- Parameters:
splitter_step (SplitterStep) – The
SplitterStepthat does the splitting of the input data.aggregator_step (AggregatorStep) – The
AggregatorStepthat aggregates the output data.
- Returns:
The updated
StepGraphthat includesSplitterStep,Step, andAggregatorStepnodes.- Return type:
- _update_slot_mappings(splitter_step, aggregator_step)[source]
Updates the
SlotMappings.This updates the slot mappings to that the
Step'sinputs are redirected to theSplitterStepand the outputs are redirected to theAggregatorStep.- Return type:
- Parameters:
splitter_step (SplitterStep) – The
SplitterStepthat does the splitting of the input data.aggregator_step (AggregatorStep) – The
AggregatorStepthat aggregates the output data.
- Returns:
Updated
SlotMappingsthat account forSplitterStepandAggregatorStep.
- class easylink.step.SplitterStep(name, split_slot, splitter_func_name)[source]
Bases:
StandaloneStepA
StandaloneStepthat splits anInputSlotfor parallel processing.A
SplitterStepis intended to be used in conjunction with a correspondingAggregatorStepand only during construction of anAutoParallelStep.See
Stepfor inherited attributes.- Parameters:
- splitter_func_name
The name of the splitter function to be used.
- add_nodes_to_implementation_graph(implementation_graph)[source]
Adds a
NullImplementationnode to theImplementationGraph.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- _abc_impl = <_abc._abc_data object>
- class easylink.step.AggregatorStep(name, output_slot, aggregator_func_name, splitter_node_name)[source]
Bases:
StandaloneStep- Parameters:
name (str)
output_slot (OutputSlot)
aggregator_func_name (str)
splitter_node_name (str)
- aggregator_func_name
The name of the aggregator function to be used.
- splitter_node_name
The name of the
SplitterStepand its correspondingNullSplitterImplementationthat thisAggregatorStepis associated with.
- add_nodes_to_implementation_graph(implementation_graph)[source]
Adds a
NullImplementationnode to theImplementationGraph.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- _abc_impl = <_abc._abc_data object>
- class easylink.step.ChoiceStep(step_name, input_slots, output_slots, choices)[source]
Bases:
StepA type of
Stepthat allows for choosing from a set of options.See
Stepfor inherited attributes.- Parameters:
step_name (str) – The name of the
ChoiceStep.input_slots (Iterable[InputSlot]) – All required
InputSlots.output_slots (Iterable[OutputSlot]) – All required
OutputSlots.choices (dict[str, dict[str, Step | SlotMapping]]) – A dictionary of choices, where the keys are the names/types of choices and the values are dictionaries containing that type’s
Stepand relatedSlotMappings.
Notes
ChoiceStepsare by definition non-leaf but do not require the typicalStep.config_keyin the pipeline specification file. Instead, the pipeline configuration must contain a ‘type’ key that specifies which option to choose.The
choicesdictionary must contain the choice type names as the outer keys. The values of each of these types is then another dictionary containing ‘step’, ‘input_slot_mappings’, and ‘output_slot_mappings’ keys with their corresponding values.Each choice type must specify a single
Stepand its associatedSlotMappings. Any choice paths that require multiple sub-steps should specify aHierarchicalStep.- choices
A dictionary of choices, where the keys are the names/types of choices and the values are dictionaries containing that type’s nodes, edges, and
SlotMappings.
- validate_step(step_config, combined_implementations, input_data_config)[source]
Validates the
ChoiceStep.- Return type:
- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- Returns:
A dictionary of errors, where the keys are the
ChoiceStepname and the values are lists of error messages associated with the givenStep.
Notes
If the
Stepdoes not validate (i.e. errors are found and the returned dictionary is non-empty), the tool will exit and the pipeline will not run.We attempt to batch error messages as much as possible, but there may be times where the configuration is so ill-formed that we are unable to handle all issues in one pass. In these cases, new errors may be found after the initial ones are handled.
We do not attempt to validate the subgraph here if the ‘type’ key is unable to be validated.
- set_configuration_state(step_config, combined_implementations, input_data_config)[source]
Sets the configuration state to ‘non-leaf’.
In addition to setting the configuration state, this also updates the
StepGraphandSlotMappings.- Parameters:
step_config (LayeredConfigTree) – The internal configuration of this
Step, i.e. it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- class easylink.step.ConfigurationState(step, step_config, combined_implementations, input_data_config)[source]
Bases:
ABCA given
Step'sconfiguration state.A
ConfigurationStatedefines the exact pipeline configuration state for a givenStep, including the strategy required to get theImplementationGraphfrom it. There are two possible types of configuration states, “leaf” and “non-leaf”, and each has its own concrete class,LeafConfigurationStateandNonLeafConfigurationState, respectively.- Parameters:
step (Step) – The
StepthisConfigurationStateis tied to.step_config (LayeredConfigTree) – The internal configuration of this
Stepwe are setting the state for; it should not include theStep'sname.combined_implementations (LayeredConfigTree) – The configuration for any implementations to be combined.
input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- _abc_impl = <_abc._abc_data object>
- _step
The
StepthisConfigurationStateis tied to.
- step_config
The internal configuration of this
Stepwe are setting the state for; it should not include theStep'sname.
- combined_implementations
The relevant configuration if the
Step'sImplementationhas been requested to be combined with that of a differentStep.
- input_data_config
The input data configuration for the entire pipeline.
- abstract add_nodes_to_implementation_graph(implementation_graph)[source]
Adds this
Step'sImplementation(s)as nodes to theImplementationGraph.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- abstract add_edges_to_implementation_graph(implementation_graph)[source]
Adds the edges of this
Step'sImplementation(s)to theImplementationGraph.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- abstract get_implementation_edges(edge)[source]
Gets the edge information for the
Implementationrelated to thisStep.- Return type:
- Parameters:
edge (EdgeParams) – The
Step'sedge information to be propagated to theImplementationGraph.- Returns:
The
Implementation'sedge information.
- class easylink.step.LeafConfigurationState(step, step_config, combined_implementations, input_data_config)[source]
Bases:
ConfigurationStateThe
ConfigurationStatefor a leafStep.A
LeafConfigurationStateis a concrete class that corresponds to a leafStep, i.e. one that is implemented by a singleImplementation.See
ConfigurationStatefor inherited attributes.- Parameters:
step (Step)
step_config (LayeredConfigTree)
combined_implementations (LayeredConfigTree)
input_data_config (LayeredConfigTree)
- _abc_impl = <_abc._abc_data object>
- property implementation_config: LayeredConfigTree
The
Step'sspecificImplementationconfiguration.
- add_nodes_to_implementation_graph(implementation_graph)[source]
Adds this
Step'sImplementationas a node to theImplementationGraph.A
Stepin a leaf configuration state by definition has no sub-Stepsto unravel; we are able to directly instantiate anImplementationand add it to theImplementationGraph.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- add_edges_to_implementation_graph(implementation_graph)[source]
Adds the edges for this
Step'sImplementationto theImplementationGraph.Stepsin aLeafConfigurationStatedo not actually have edges within them (they are represented by a single node in theImplementationGraph) and so we simply pass.- Return type:
- get_implementation_edges(edge)[source]
Gets the edge information for the
Implementationrelated to thisStep.- Return type:
- Parameters:
edge (EdgeParams) – The
Step'sedge information to be propagated to theImplementationGraph.- Raises:
ValueError – If the
Stepis not in the edge or if no edges related to thisStepare found.- Returns:
The
Implementation'sedge information.
- class easylink.step.NonLeafConfigurationState(step, step_config, combined_implementations, input_data_config)[source]
Bases:
ConfigurationStateThe
ConfigurationStatefor a non-leafStep.A
NonLeafConfigurationStateis a concrete class that corresponds to a non-leafStep, i.e. one that has a non-trivialStepGraph.See
ConfigurationStatefor inherited attributes.- Parameters:
step (Step) – The
StepthisConfigurationStateis tied to.step_config (LayeredConfigTree) – The internal configuration of this
Stepwe are setting the state for; it should not include theStep'sname (though it must include the sub-step names).combined_implementations (LayeredConfigTree) – The configuration for any
Implementationsto be combined.input_data_config (LayeredConfigTree) – The input data configuration for the entire pipeline.
- Raises:
ValueError – If the
Stepdoes not have aStepGraph.
Notes
The first instance of a
NonLeafConfigurationStateis created when callingconfigure_pipeline()on thePipelineSchemathat is chosen for a given EasyLink run; thesteppassed in is the entirePipelineSchemaand thepipeline_configis that of the entire requested pipeline (which is by definition a non-leafStep).Upon instantiation of a
NonLeafConfigurationState, the_configure_subgraph_steps()method is called which iterates through theStep'schildren and sets their configuration state. If any of these childStepsare also non-leaf, the process continues recursively until all nodes are leafStepswith a correspondingLeafConfigurationState.- _abc_impl = <_abc._abc_data object>
- add_nodes_to_implementation_graph(implementation_graph)[source]
Adds this
Step'sImplementationsas nodes to theImplementationGraph.This is a recursive function; it calls itself until all sub-
Stepsare of aLeafConfigurationStateand have had their correspondingImplementationsadded as nodes to theImplementationGraph.- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- add_edges_to_implementation_graph(implementation_graph)[source]
Adds the edges of this
Step'sImplementationsto theImplementationGraph.This method does two things: 1. Adds the edges at this level (i.e. at the
Steptied to thisNonLeafConfigurationState) to theImplementationGraph. 2. Recursively traverses all sub-steps and adds their edges to theImplementationGraph.Note that to achieve (1), edges must be mapped from being between steps at this level of the hierarchy, all the way down to being between concrete implementations. Mapping each edge down to the implementation level is itself a recursive operation (see
get_implementation_edges).- Return type:
- Parameters:
implementation_graph (ImplementationGraph)
- get_implementation_edges(edge)[source]
Gets the edges for the
Implementationrelated to thisStep.This method maps an edge between
Stepsin thisStep'sStepGraphto one or more edges betweenImplementationsby applyingSlotMappings.- Return type:
- Parameters:
edge (EdgeParams) – The edge information of the edge in the
StepGraphto be mapped to theImplementationlevel.- Raises:
ValueError – If the
Stepis not in the edge or if no edges related to thisStepare found.- Returns:
A list of edges between
Implementationswhich are ready to add to theImplementationGraph.
Notes
In EasyLink, an edge (in either a
StepGraphorImplementationGraph) sconnects twoSlot.The core of this method is to map the
Slotson theStepGraphedge to the correspondingSlotsonImplementations.At each level in the step hierarchy,
SlotMappingsindicate how to map aSlotto the level below in the hierarchy.This method recurses through the step hierarchy until it reaches the leaf
Stepsrelevant to this edge in order to compose all theSlotMappingsthat should apply to it.Because a single
Stepcan become multiple nodes in theImplementationGraph(e.g. aTemplatedStep), a single edge betweenStepsmay actually become multiple edges betweenImplementations, which is why this method can return a list.
- _configure_subgraph_steps()[source]
Sets the configuration state for all
Stepsin theStepGraph.This method recursively traverses the
StepGraphand sets the configuration state for eachStepuntil reaching all leaf nodes.- Return type:
Notes
If a
Stepname is missing from thestep_config, we know that it must have a default implementation because we already validated that one exists duringHierarchicalStep._validate_step_graph(). In that case, we manually instantiate and use astep_configwith the default implementation.