Graph Components

This module is responsible for defining the modular building-block objects that can be composed to create graph representations of pipelines.

class easylink.graph_components.InputSlot(name, env_var, validator)[source]

Bases: object

A single input slot to a specific node.

InputSlots represent distinct semantic categories of input files, between which a node must be able to differentiate. In order to pass data between nodes, an InputSlot of one node can be connected to an OutputSlot of another node via an EdgeParams instance.

Notes

Nodes can be either Steps or Implementations.

Parameters:
name: str

The name of the InputSlot.

env_var: str | None

The environment variable that is used to pass a list of data filepaths to an Implementation.

validator: Callable[[str], None] | None

A function that validates the input data being passed into the pipeline via this InputSlot. If the data is invalid, the function should raise an exception with a descriptive error message which will then be reported to the user. Note that the function *must* be defined in the easylink.utilities.validation_utils module!

class easylink.graph_components.OutputSlot(name)[source]

Bases: object

A single output slot from a specific node.

Outputslots represent distinct semantic categories of output files, between which a node must be able to differentiate. In order to pass data between nodes, an OutputSlot of one node can be connected to an InputSlot of another node via an EdgeParams instance.

Notes

Nodes can be either Steps or Implementations.

Input data is validated via the InputSlot required validator attribute. In order to prevent multiple validations of the same files (since outputs of one node can be inputs to another), no such validator is stored here on the OutputSlot.

Parameters:

name (str)

name: str

The name of the OutputSlot.

class easylink.graph_components.EdgeParams(source_node, target_node, output_slot, input_slot, filepaths=None)[source]

Bases: object

The details of an edge between two nodes in a graph.

EdgeParams connect the OutputSlot of a source node to the InputSlot of a target node.

Notes

Nodes can be either Steps or Implementations.

Parameters:
  • source_node (str)

  • target_node (str)

  • output_slot (str)

  • input_slot (str)

  • filepaths (tuple[str] | None)

source_node: str

The name of the source node.

target_node: str

The name of the target node.

output_slot: str

The name of the source node’s OutputSlot.

input_slot: str

The name of the target node’s InputSlot.

filepaths: tuple[str] | None = None

The filepaths that are passed from the source node to the target node.

classmethod from_graph_edge(source, sink, edge_attrs)[source]

A convenience method to create an EdgeParams instance.

Return type:

EdgeParams

Parameters:
  • source (str) – The name of the source node.

  • sink (str) – The name of the target node.

  • edge_attrs (dict[str, OutputSlot | InputSlot | str | None]) – The attributes of the edge connecting the source and target nodes. ‘output_slot’ and ‘input_slot’ are required keys while ‘filepaths’ is optional.

class easylink.graph_components.StepGraph(*args, backend=None, **kwargs)[source]

Bases: MultiDiGraph

A directed acyclic graph (DAG) of Steps.

StepGraphs are DAGs with Step names for nodes and their corresponding Step instances as attributes on those nodes. The file dependencies between nodes are the graph edges; multiple edges between nodes are permitted.

Notes

These are high-level abstractions; they represent a conceptual pipeline graph with no detail as to how each Step is implemented.

The largest StepGraph is that of the entire PipelineSchema.

property step_nodes: list[str]

The topologically sorted list of Step names.

property steps: list[Step]

The topologically sorted list of all Steps in the graph.

add_node_from_step(step)[source]

Adds a new node to the StepGraph.

Return type:

None

Parameters:

step (Step) – The Step to add to the graph as a new node.

add_edge_from_params(edge_params)[source]

Adds a new edge to the StepGraph.

Return type:

None

Parameters:

edge_params (EdgeParams) – The details of the new edge to be added to the graph.

class easylink.graph_components.ImplementationGraph(*args, backend=None, **kwargs)[source]

Bases: MultiDiGraph

A directed acyclic graph (DAG) of Implementations.

ImplementationGraphs are DAGs with Implementations for nodes and the file dependencies between them for edges. Self-edges as well as multiple edges between nodes are permitted.

Notes

An ImplementationGraph is a low-level abstraction; it represents the actual implementations of each Step in a pipeline. This is in contrast to a StepGraph, which can be an intricate nested structure due to the various complex and self-similar Step instances (which represent abstract operations such as “loop this step N times”). An ImplementationGraph is the flattened and concrete graph of Implementations in a given pipeline.

The largest ImplementationGraph (that is, the specific ImplementationGraph representing the entire pipeline to be run) is that of the PipelineGraph.

property implementation_nodes: list[str]

The topologically sorted list of Implementation names.

property splitter_nodes: list[str]

The topologically sorted list of splitter nodes (which have no implementations).

property aggregator_nodes: list[str]

The topologically sorted list of aggregator nodes (which have no implementations).

property implementations: list[Implementation]

The topologically sorted list of all Implementations in the graph.

add_node_from_implementation(node_name, implementation)[source]

Adds a new node to the ImplementationGraph.

Return type:

None

Parameters:
  • node_name – The name of the new node.

  • implementation (Implementation) – The Implementation to add to the graph as a new node.

add_edge_from_params(edge_params)[source]

Adds a new edge to the ImplementationGraph.

Return type:

None

Parameters:

edge_params (EdgeParams) – The details of the new edge to be added to the graph.

class easylink.graph_components.SlotMapping(parent_slot, child_node, child_slot)[source]

Bases: ABC

A mapping between a slot on a parent node and a slot on one of its child nodes.

SlotMapping is an interface intended to be used by concrete InputSlotMapping and OutputSlotMapping classes to represent a mapping between parent and child nodes at different levels of a potentially-nested graph. Specifically, they are used to (1) remap edges between parent and child nodes in a PipelineSchema and (2) map a leaf Step's slots to the corresponding Implementation slots when building the ImplementationGraph.

Notes

Nodes can be either Steps or Implementations.

Parameters:
  • parent_slot (str)

  • child_node (str)

  • child_slot (str)

_abc_impl = <_abc._abc_data object>
parent_slot: str

The name of the parent slot.

child_node: str

The name of the child node.

child_slot: str

The name of the child slot.

abstract remap_edge(edge)[source]

Remaps an edge to connect the parent and child nodes.

Return type:

EdgeParams

Parameters:

edge (EdgeParams)

class easylink.graph_components.InputSlotMapping(parent_slot, child_node, child_slot)[source]

Bases: SlotMapping

A mapping between InputSlots of a parent node and a child node.

Parameters:
  • parent_slot (str)

  • child_node (str)

  • child_slot (str)

_abc_impl = <_abc._abc_data object>
remap_edge(edge)[source]

Remaps an edge’s InputSlot.

Return type:

EdgeParams

Parameters:

edge (EdgeParams) – The edge to remap.

Returns:

The details of the remapped edge.

Raises:

ValueError – If the parent slot does not match the input slot of the edge.

class easylink.graph_components.OutputSlotMapping(parent_slot, child_node, child_slot)[source]

Bases: SlotMapping

A mapping between InputSlots of a parent node and a child node.

Parameters:
  • parent_slot (str)

  • child_node (str)

  • child_slot (str)

_abc_impl = <_abc._abc_data object>
remap_edge(edge)[source]

Remaps an edge’s OutputSlot.

Return type:

EdgeParams

Parameters:

edge (EdgeParams) – The edge to remap.

Returns:

The details of the remapped edge.

Raises:

ValueError – If the parent slot does not match the output slot of the edge.