linchemin.rem.graph_distance.compute_distance_matrix

linchemin.rem.graph_distance.compute_distance_matrix(syngraphs: List[MonopartiteReacSynGraph | BipartiteSynGraph], ged_method: str, ged_params: dict | None = None, parallelization: bool = False, n_cpu=8) DataFrame[source]

To compute the distance matrix of a set of routes.

Parameters:

syngraphs: List[Union[MonopartiteReacSynGraph, BipartiteSynGraph]]

The routes for which the distance matrix must be computed

ged_method: str

The graph edit distance method to be used

ged_params: Optional[Union[dict, None]]

The dictionary containing the parameters for fingerprints and similarity calculations; if it is not provided, the default values are used (default None)

parallelization: Optional[bool]

Whether parallelization should be used (default False)

n_cpu: Optional[int]

If parallelization is activated, it indicates the number of CPUs to be used (default 8)

Returns:

matrix: a pandas DataFrame

The distance matrix, with dimensions (n routes x n routes), with the graph distances

Example:

>>> graph = json.loads(open('az_file.json').read())
>>> mp_syngraphs = [translator('az_retro', g, 'syngraph', out_data_model='monopartite_reactions') for g in graph]
>>> m = compute_distance_matrix(mp_syngraphs, ged_method='nx_ged')