Pipeline
This module is responsible for the Pipeline class, whose primary purpose
is to perform validations as well as generate the Snakefile to be used by
Snakemake to execute the pipeline.
- class easylink.pipeline.Pipeline(config)[source]
Bases:
objectA convenience class for validations and Snakefile generation.
- pipeline_graph
The
PipelineGraphobject.
- spark_is_required
A boolean indicating whether the pipeline requires Spark.
- any_auto_parallel
A boolean indicating whether any implementation in the pipeline is to be automatically run in parallel.
- build_snakefile()[source]
Generates the Snakefile for this
Pipeline.This method dynamically builds the Snakefile by generating all necessary setup instructions (e.g. imports, configuration settings) as well as all rules for each
Implementationin the pipeline and appending them to the Snakefile.- Return type:
- Returns:
The path to the Snakefile.
Notes
We use the Snakemake term “rule” to refer to a singular component in a Snakefile (i.e. in a Snakemake pipeline) that defines input files, output files, and the command to run to create those output files. These rules are generated dynamically as strings and appended to the Snakefile.
- _validate()[source]
Validates the pipeline.
- Return type:
- Raises:
SystemExit – If any errors are found, they are batch-logged into a dictionary and the program exits with a non-zero code.
- _validate_implementations()[source]
Validates each individual
Implementationinstance.- Return type:
- Returns:
A dictionary of
Implementationvalidation errors.
- static _get_input_slots_to_split(input_slot_dict)[source]
Gets any input slots that have a splitter attribute.
- _write_target_rules()[source]
Writes the rule for the final output and its validation.
The input files to the target rule (i.e. the result node) are the final output themselves.
- Return type:
- _write_spark_config()[source]
Writes configuration settings to the Snakefile.
- Return type:
Notes
This is currently only applicable for spark-dependent pipelines.
- _write_spark_module()[source]
Inserts the
easylink.utilities.spark.smkSnakemake module into the Snakefile.- Return type:
- _write_implementation_rules(node_name)[source]
Writes the rules for each
Implementation.This method writes all rules required for a given
Implementation, e.g. splitters and aggregators (if necessary), validations, and the actual rule to run the container itself.
- _write_checkpoint_rule(node_name, checkpoint_filepath)[source]
Writes the snakemake checkpoint rule.
This builds the
CheckpointRulewhich splits the data into (unprocessed) chunks and saves them in the output directory using wildcards.
- _write_aggregation_rule(node_name, checkpoint_filepath)[source]
Writes the snakemake aggregation rule.
This builds the
AggregationRulewhich aggregates the processed data from the chunks originally created by theSplitterRule.