linchemin.interfaces.workflows.process_routes

linchemin.interfaces.workflows.process_routes(input_dict: dict, output_format: str = 'json', mapping: bool = False, functionalities: List[str] | None = None, mapper: str | None = 'rxnmapper', out_data_model: str = 'bipartite', descriptors: List[str] | None = None, ged_method: str = 'nx_ged', ged_params: dict | None = None, clustering_method: str | None = None, parallelization: bool = False, n_cpu: int = 8) WorkflowOutput[source]

Function process routed predicted by CASP tools: based on the input arguments, only the selected functionalities are performed. The mandatory start and stop actions are (i) to read a json file containing the routes predicted by a CASP tool, and (ii) to write the routes in an output file. Possible additional actions are:

  • performing the atom mapping of the reactions involved in the routes

  • computing route descriptors

  • computing the distance matrix between the routes

  • clustering the routes

  • merging the routes

  • extracting the reaction strings from the routes

Parameters:

input_dict: dict

The path to the input files and the relative casp names in the form {‘file_path’: ‘casp_name’}

output_format: Optional[str]

The type of file to which the routes should be written (default ‘json’)

mapping: Optional[bool]

Whether the reactions involved in the routes should go through the atom-to-atom mapping (default False)

functionalities: Optional[Union[List[str], None]]

The list of the functionalities to be performed; if it is None, the input routes are read and written to a file (default None)

mapper: Optional[str]

The name of the mapping tool to be used; if it is None, the mapping pipeline is used (default None)

out_data_model: Optional[str]

The data model for the output routes (default ‘bipartite’)

descriptors: Optional[Union[List[str], None]]

The list of the descriptos to be computed; if it is None, all the available are calculated (default None)

ged_method: Optional[str]

The method to be used for graph similarity calculations (default ‘nx_ged’)

ged_params: Optional[Union[dict, None]]

The dictionary with the parameters for specifying reaction and molecular fingerprints and similarity functions; if it is None, the default values are used (default None)

clustering_method: Optional[Union[str, None]]

The clustering algorithm to be used for clustering the routes; if it is None, hdbscan is used when there are more than 15 routes, Agglomerative Clustering otherwise (default None)

parallelization: Optional[bool]

Whether parallel computing should be used where possible (default False)

n_cpu: Optional[int]

The number of cpus to be used if parallelization is used (default 8)

Returns:

output: WorkflowOutput

Its attributes store the results of the selected functionalities. The outcomes are also written to files.

Raises:

NoValidRoute: if the input file(s) does not contain any valid route

KeyError: if a selected option is not available

Example:

>>> output = process_routes({'ibmrxn_file.json': 'ibmrxn',  # path to json file from ibmrxn
>>>                         'az_file.json': 'az'},         # path to json file from az casp
>>>                         functionalities=[              # the functionalities to be activated
>>>                            'compute_descriptors',      # calculation of routes descriptors
>>>                            'clustering_and_d_matrix',  # calculation of distance matrix and clustering
>>>                            'merging'])                 # merging of the routes to obtain a "tree"