linchemin.interfaces.workflows.process_routes¶
- linchemin.interfaces.workflows.process_routes(input_dict: dict, output_format: str = 'json', mapping: bool = False, functionalities: List[str] | None = None, mapper: str | None = 'rxnmapper', out_data_model: str = 'bipartite', descriptors: List[str] | None = None, ged_method: str = 'nx_ged', ged_params: dict | None = None, clustering_method: str | None = None, parallelization: bool = False, n_cpu: int = 8) WorkflowOutput[source]¶
Function process routed predicted by CASP tools: based on the input arguments, only the selected functionalities are performed. The mandatory start and stop actions are (i) to read a json file containing the routes predicted by a CASP tool, and (ii) to write the routes in an output file. Possible additional actions are:
performing the atom mapping of the reactions involved in the routes
computing route descriptors
computing the distance matrix between the routes
clustering the routes
merging the routes
extracting the reaction strings from the routes
Parameters:¶
- input_dict: dict
The path to the input files and the relative casp names in the form {‘file_path’: ‘casp_name’}
- output_format: Optional[str]
The type of file to which the routes should be written (default ‘json’)
- mapping: Optional[bool]
Whether the reactions involved in the routes should go through the atom-to-atom mapping (default False)
- functionalities: Optional[Union[List[str], None]]
The list of the functionalities to be performed; if it is None, the input routes are read and written to a file (default None)
- mapper: Optional[str]
The name of the mapping tool to be used; if it is None, the mapping pipeline is used (default None)
- out_data_model: Optional[str]
The data model for the output routes (default ‘bipartite’)
- descriptors: Optional[Union[List[str], None]]
The list of the descriptos to be computed; if it is None, all the available are calculated (default None)
- ged_method: Optional[str]
The method to be used for graph similarity calculations (default ‘nx_ged’)
- ged_params: Optional[Union[dict, None]]
The dictionary with the parameters for specifying reaction and molecular fingerprints and similarity functions; if it is None, the default values are used (default None)
- clustering_method: Optional[Union[str, None]]
The clustering algorithm to be used for clustering the routes; if it is None, hdbscan is used when there are more than 15 routes, Agglomerative Clustering otherwise (default None)
- parallelization: Optional[bool]
Whether parallel computing should be used where possible (default False)
- n_cpu: Optional[int]
The number of cpus to be used if parallelization is used (default 8)
Returns:¶
- output: WorkflowOutput
Its attributes store the results of the selected functionalities. The outcomes are also written to files.
Raises:¶
NoValidRoute: if the input file(s) does not contain any valid route
KeyError: if a selected option is not available
Example:¶
>>> output = process_routes({'ibmrxn_file.json': 'ibmrxn', # path to json file from ibmrxn >>> 'az_file.json': 'az'}, # path to json file from az casp >>> functionalities=[ # the functionalities to be activated >>> 'compute_descriptors', # calculation of routes descriptors >>> 'clustering_and_d_matrix', # calculation of distance matrix and clustering >>> 'merging']) # merging of the routes to obtain a "tree"