linchemin.rem.clustering.clusterer

linchemin.rem.clustering.clusterer(syngraphs: List[MonopartiteReacSynGraph | BipartiteSynGraph], ged_method: str, clustering_method: str, ged_params: dict | None = None, save_dist_matrix: bool = False, parallelization: bool = False, n_cpu: int | None = None, **kwargs) tuple[source]

To cluster a list of SynGraph objects based on their graph edit distance

Parameters:

syngraphs: List[Union[MonopartiteReacSynGraph, BipartiteSynGraph]]

The routes to be clustered

ged_method: str

The algorithm to be used for GED calculations

clustering_method: str

The clustering algorithm to be used

save_dist_matrix: Optional[bool]

Whether the distance matrix should be saved and returned as output (default False)

ged_params: Union[dict, None]

It contains the optional parameters for ged calculations; if it is not provided, the default parameters are used (default None)

parallelization: Optional[bool]

Whether parallelization should be used for computing distance matrix (default False)

n_cpu: Union[int, None]

If parallelization is activated, it indicates the number of CPUs to be used (default ‘mp.cpu_count()’)

**kwargs:

The optional parameters specific of the selected clustering algorithm

Returns:

clustering, score, (dist_matrix): tuple

The clustering algorithm output, the silhouette score and the distance matrix (save_dist_matrix=True)

Raises:

SingleRouteClustering: if the input list contains less than 2 routes

UnavailableClusteringAlgorithm: if the selected clustering algorithm is not available

Example:

>>> graph = json.loads(open('az_file.json').read())
>>> syngraphs = [translator('az_retro', g, 'syngraph', out_data_model='monopartite_reactions') for g in graph]
>>> cluster1, score1 = clusterer(syngraphs,
>>>                              ged_method='nx_optimized_ged',
>>>                              clustering_method='agglomerative_cluster')