Configuration

This module is responsible for managing an easylink run’s configuration as defined by various user-input specification files.

easylink.configuration.DEFAULT_ENVIRONMENT = {'environment': {'computing_environment': 'local', 'container_engine': 'undefined', 'implementation_resources': {'cpus': 1, 'memory': 1, 'time_limit': 1}}}: The default environment configuration settings.

easylink.configuration.SPARK_DEFAULTS = {'keep_alive': False, 'workers': {'cpus_per_node': 1, 'mem_per_node': 1, 'num_workers': 2, 'time_limit': 1}}: The default spark configuration settings.

class easylink.configuration.Config(config_params, schema_name='main', images_dir=None, command='run')[source]

Bases: LayeredConfigTree

A container for configuration information.

The Config contains the user-provided specifications for the pipeline, input data, and computing environment specifications. It is a nested dictionary-like object that supports prioritized layers of configuration settings as well as dot-notation access to its attributes.

The Config is also reponsible for various validation checks on the provided specifications. If any of these are invalid, a validation error is raised with as much information as can possibly be provided.

Parameters:

config_params (dict[str, Any]) – A dictionary of all specifications required to run the pipeline. This includes the pipeline, input data, and computing environment specifications, as well as the results directory and images directory.
schema_name (str) – The name of the schema to validate the pipeline configuration against.
images_dir (str | Path | None) – The directory containing the images or to download the images to if they don’t exist. If None, will default to the DEFAULT_IMAGES_DIR.
command (str) – The EasyLink command being run.

environment: The environment configuration, including computing environment, container engine, implementation resources, and slurm- and spark-specific requests.

pipeline: The pipeline configuration.

input_data: The input data filepaths.

schema: The PipelineSchema.

images_dir: The directory containing the images or to download the images to if they don’t exist. If None, will default to ~/.easylink_images.

command: The EasyLink command being run.

property computing_environment: str: The computing environment to run on (‘local’ or ‘slurm’).

property slurm: dict[str, Any]: A dictionary of slurm-specific configuration settings.

property spark: dict[str, Any]: A dictionary of spark-specific configuration settings.

property slurm_resources: dict[str, str]: A flat dictionary of slurm resource requests.

property spark_resources: dict[str, Any]: A flat dictionary of spark resource requests.

_get_schema(schema_name='main')[source]

Gets the requested PipelineSchema.

The schema is only returned if it validates the pipeline configuration.

Return type:: PipelineSchema
Parameters:: schema_name (str) – The name of the specific PipelineSchema to validate the pipeline configuration against.
Returns:: The requested PipelineSchema if it validates the requested pipeline configuration.
Raises:: SystemExit – If the pipeline configuration is not valid for the requested schema, the program exits with a non-zero code and all validation errors found are logged.

Notes

This acts as the pipeline configuration file’s validation method since we can only validate the PipelineSchema if that file is valid.

_validate()[source]

Validates the Config.

Return type:: None
Raises:: SystemExit – If any errors are found, they are batch-logged into a dictionary and the program exits with a non-zero code.

Notes

Pipeline validations are handled in _get_schema().

_validate_input_data()[source]

Validates the input data configuration.

Return type:: dict[Any, Any]
Returns:: A dictionary of input data configuration validation errors.

_validate_environment()[source]

Validates the environment configuration.

Return type:: dict[Any, Any]
Returns:: A dictionary of environment configuration validation errors.

easylink.configuration.load_params_from_specification(pipeline_specification, input_data, computing_environment, results_dir)[source]

Gathers together all specification data.

This gathers the pipeline, input data, and computing environment specifications as well as the results directory into a single dictionary for insertion into the Config object.

Return type:

dict[str, Any]

Parameters:

pipeline_specification (str | Path) – The path to the pipeline specification file.
input_data (str | Path) – The path to the input data specification file.
computing_environment (str | Path | None) – The path to the computing environment specification file.
results_dir (str | Path) – The path to the results directory.

Returns:

A dictionary of all provided specification data.

easylink.configuration._load_input_data_paths(input_data_specification_path)[source]

Creates a dictionary of input data paths from the input data specification file.

Return type:: dict[str, list[Path]]
Parameters:: input_data_specification_path (str | Path)

easylink.configuration._load_computing_environment(computing_environment_specification_path)[source]

Loads the computing environment specification file and returns the contents as a dict.

Return type:: dict[Any, Any]
Parameters:: computing_environment_specification_path (str | Path | None)