Data Utilities

This module contains utility functions for handling data files and directories.

easylink.utilities.data_utils.modify_umask(func)[source]

Decorates a function to modify a process’s umask temporarily before calling the function.

This decorator sets the umask to 0o002, which grants write permission to the group while preserving the umask settings for the owner and others. It ensures that any file or directory created by the decorated function has group write permissions. After the function executes, the decorator restores the original umask.

Return type:

Callable

Parameters:

func (Callable) – The function to be decorated. It can be any callable that might create files or directories during its execution.

Returns:

A wrapper function that, when called, modifies the umask, calls the original function with the provided arguments, and finally restores the umask to its original value.

easylink.utilities.data_utils.create_results_directory(*args, **kwargs)[source]
easylink.utilities.data_utils.create_results_intermediates(*args, **kwargs)[source]
easylink.utilities.data_utils.copy_configuration_files_to_results_directory(pipeline_specification, input_data, computing_environment, results_dir)[source]

Copies all configuration files into the results directory.

Return type:

None

Parameters:
  • pipeline_specification (Path) – The filepath to the pipeline specification file.

  • input_data (Path) – The filepath to the input data specification file (_not_ the paths to the input data themselves).

  • computing_environment (Path | None) – The filepath to the specification file defining the computing environment to run the pipeline on.

  • results_dir (Path) – The directory to write results and incidental files (logs, etc.) to.

easylink.utilities.data_utils.get_results_directory(output_dir, no_timestamp)[source]

Determines the results directory path.

This function determines the filepath for storing results by (optionally) appending a timestamp to the specified output directory. If no output directory is provided, it defaults to a directory named ‘results’ in the current working directory.

Return type:

Path

Parameters:
  • output_dir (str | None) – The directory to write results and incidental files (logs, etc.) to. If no value is provided, results will be written to a ‘results/’ directory in the current working directory.

  • no_timestamp (bool) – Whether or not to save the results in a timestamped sub-directory.

Returns:

The fully resolved path to the results directory.

easylink.utilities.data_utils._get_timestamp()[source]
Return type:

str

easylink.utilities.data_utils.load_yaml(filepath)[source]

Loads and returns the contents of a YAML file.

This function uses yaml.safe_load to parse the YAML file, which is designed to safely load a subset of YAML without executing arbitrary code.

Return type:

dict

Parameters:

filepath (str | Path) – The path to the YAML file to be loaded.

Returns:

The contents of the YAML file.

easylink.utilities.data_utils.download_image(*args, **kwargs)[source]
easylink.utilities.data_utils.calculate_md5_checksum(output_path)[source]
Return type:

str

Parameters:

output_path (Path)