Implementations
This module is responsible for defining the abstractions that represent actual implementations of steps in a pipeline. Typically, these abstractions contain information about what container to run for a given step and other related details.
- class easylink.implementation.Implementation(schema_steps, implementation_config, input_slots=(), output_slots=(), is_auto_parallel=False)[source]
Bases:
objectA representation of an actual container that will be executed for a
Step.Implementationsexist at a lower level thanSteps. This class contains information about what container to use, what environment variables to set inside the container, and some metadata about the container.- Parameters:
schema_steps (list[str]) – The user-requested
Stepnames for which thisImplementationis expected to implement.implementation_config (LayeredConfigTree) – The configuration details required to run the relevant container.
input_slots (Iterable[InputSlot]) – All required
InputSlots.output_slots (Iterable[OutputSlot]) – All required
OutputSlots.is_auto_parallel (bool)
- name
The name of this
Implementation.
- input_slots
A mapping of
InputSlotnames to their instances.
- output_slots
A mapping of
OutputSlotnames to their instances.
- environment_variables
A mapping of environment variables to set.
- metadata_steps
The names of the specific
Stepsfor which thisImplementationhas been designed to implement.
- schema_steps
The names of the specific
Stepsthat the user has requested to be implemented by this particularImplementation.
- requires_spark
Whether this
Implementationrequires a Spark environment.
- validate(skip_image_validation, images_dir)[source]
Validates individual
Implementationinstances.- Return type:
- Returns:
A list of logs containing any validation errors. Each item in the list is a distinct message about a particular validation error (e.g. if a required image does not exist).
- Parameters:
Notes
This is intended to be run from
easylink.pipeline.Pipeline._validate().
- _validate_expected_steps(logs)[source]
Validates that the
Implementationis responsible for the correct steps.
- _download_and_validate_image(logs, images_dir)[source]
Downloads the image if required and validates it exists.
If the image does not exist in the specified images directory, it will attempt to download it.
- _get_env_vars(implementation_config)[source]
Gets the relevant environment variables.
- Return type:
- Parameters:
implementation_config (LayeredConfigTree)
- class easylink.implementation.NullImplementation(name, input_slots=(), output_slots=())[source]
Bases:
objectA partial
Implementationinterface when no container is needed to run.The primary use case for this class is to be able to add a
Stepthat does not have a correspondingImplementationto anImplementationGraphsince adding any new node requires an object withInputSlotandOutputSlotnames.- Parameters:
name (str) – The name of this
NullImplementation.input_slots (Iterable[InputSlot]) – All required
InputSlots.output_slots (Iterable[OutputSlot]) – All required
OutputSlots.
- name
The name of this
NullImplementation.
- input_slots
A mapping of
InputSlotnames to their instances.
- output_slots
A mapping of
OutputSlotnames to their instances.
- combined_name
The name of the combined implementation of which
NullImplementationis a constituent. This is definitionally None.
- class easylink.implementation.NullSplitterImplementation(name, input_slots, output_slots, splitter_func_name)[source]
Bases:
NullImplementationA type of
NullImplementationspecifically forSplitterSteps.See
NullImplementationfor inherited attributes.- Parameters:
splitter_func_name (str) – The name of the splitter function to use.
name (str)
input_slots (Iterable[InputSlot])
output_slots (Iterable[OutputSlot])
- splitter_func_name
The name of the splitter function to use.
- class easylink.implementation.NullAggregatorImplementation(name, input_slots, output_slots, aggregator_func_name, splitter_node_name)[source]
Bases:
NullImplementationA type of
NullImplementationspecifically forAggregatorSteps.See
NullImplementationfor inherited attributes.- Parameters:
aggregator_func_name (str) – The name of the aggregation function to use.
splitter_node_name (str) – The name of the
SplitterStepand its correspondingNullSplitterImplementationthat did the splitting.name (str)
input_slots (Iterable[InputSlot])
output_slots (Iterable[OutputSlot])
- aggregator_func_name
The name of the aggregation function to use.
- splitter_node_name
The name of the
SplitterStepand its correspondingNullSplitterImplementationthat did the splitting.
- class easylink.implementation.PartialImplementation(combined_name, schema_step, input_slots=(), output_slots=())[source]
Bases:
objectOne part of a combined implementation that spans multiple
Steps.A
PartialImplementationis what is initially added to theImplementationGraphwhen a so-called “combined implementation” is used (i.e. anImplementationthat spans multipleSteps). We initially add a node for eachStep, which has as itsimplementationattribute aPartialImplementation. Such a graph is not yet fit to run. When we make our second pass through, after the flat (non-hierarchical)PipelineGraphhas been created, we find the set ofPartialImplementationnodes corresponding to each combined implementation and replace them with a single node with a trueImplementationrepresenting the combined implementation.- Parameters:
combined_name (str) – The name of the combined implementation of which this
PartialImplementationis a part.schema_step (str) – The requested
Stepname that thisPartialImplementationpartially implements.input_slots (Iterable[InputSlot]) – The
InputSlotsfor thisPartialImplementation.output_slots (Iterable[OutputSlot]) – The
OutputSlotsfor thisPartialImplementation.
- combined_name
The name of the combined implementation of which this
PartialImplementationis a part.
- schema_step
The requested
Stepname that thisPartialImplementationpartially implements.
- input_slots
A mapping of
InputSlotnames to their instances.
- output_slots
A mapping of
OutputSlotnames to their instances.