Configuration
This module is responsible for managing an easylink run’s configuration as defined by various user-input specification files.
- easylink.configuration.DEFAULT_ENVIRONMENT = {'environment': {'computing_environment': 'local', 'container_engine': 'undefined', 'implementation_resources': {'cpus': 1, 'memory': 1, 'time_limit': 1}}}
The default environment configuration settings.
- easylink.configuration.SPARK_DEFAULTS = {'keep_alive': False, 'workers': {'cpus_per_node': 1, 'mem_per_node': 1, 'num_workers': 2, 'time_limit': 1}}
The default spark configuration settings.
- class easylink.configuration.Config(config_params, schema_name='main', images_dir=None, command='run')[source]
Bases:
LayeredConfigTreeA container for configuration information.
The
Configcontains the user-provided specifications for the pipeline, input data, and computing environment specifications. It is a nested dictionary-like object that supports prioritized layers of configuration settings as well as dot-notation access to its attributes.The
Configis also reponsible for various validation checks on the provided specifications. If any of these are invalid, a validation error is raised with as much information as can possibly be provided.- Parameters:
config_params (dict[str, Any]) – A dictionary of all specifications required to run the pipeline. This includes the pipeline, input data, and computing environment specifications, as well as the results directory and images directory.
schema_name (str) – The name of the schema to validate the pipeline configuration against.
images_dir (str | Path | None) – The directory containing the images or to download the images to if they don’t exist. If None, will default to the
DEFAULT_IMAGES_DIR.command (str) – The EasyLink command being run.
- environment
The environment configuration, including computing environment, container engine, implementation resources, and slurm- and spark-specific requests.
- pipeline
The pipeline configuration.
- input_data
The input data filepaths.
- schema
The
PipelineSchema.
- images_dir
The directory containing the images or to download the images to if they don’t exist. If None, will default to ~/.easylink_images.
- command
The EasyLink command being run.
- _get_schema(schema_name='main')[source]
Gets the requested
PipelineSchema.The schema is only returned if it validates the pipeline configuration.
- Return type:
- Parameters:
schema_name (str) – The name of the specific
PipelineSchemato validate the pipeline configuration against.- Returns:
The requested
PipelineSchemaif it validates the requested pipeline configuration.- Raises:
SystemExit – If the pipeline configuration is not valid for the requested schema, the program exits with a non-zero code and all validation errors found are logged.
Notes
This acts as the pipeline configuration file’s validation method since we can only validate the
PipelineSchemaif that file is valid.
- _validate()[source]
Validates the
Config.- Return type:
- Raises:
SystemExit – If any errors are found, they are batch-logged into a dictionary and the program exits with a non-zero code.
Notes
Pipeline validations are handled in
_get_schema().
- easylink.configuration.load_params_from_specification(pipeline_specification, input_data, computing_environment, results_dir)[source]
Gathers together all specification data.
This gathers the pipeline, input data, and computing environment specifications as well as the results directory into a single dictionary for insertion into the
Configobject.- Return type:
- Parameters:
pipeline_specification (str | Path) – The path to the pipeline specification file.
input_data (str | Path) – The path to the input data specification file.
computing_environment (str | Path | None) – The path to the computing environment specification file.
results_dir (str | Path) – The path to the results directory.
- Returns:
A dictionary of all provided specification data.