linchemin.rem.clustering.clusterer¶
- linchemin.rem.clustering.clusterer(syngraphs: List[MonopartiteReacSynGraph | BipartiteSynGraph], ged_method: str, clustering_method: str, ged_params: dict | None = None, save_dist_matrix: bool = False, parallelization: bool = False, n_cpu: int | None = None, **kwargs) tuple[source]¶
To cluster a list of SynGraph objects based on their graph edit distance
Parameters:¶
- syngraphs: List[Union[MonopartiteReacSynGraph, BipartiteSynGraph]]
The routes to be clustered
- ged_method: str
The algorithm to be used for GED calculations
- clustering_method: str
The clustering algorithm to be used
- save_dist_matrix: Optional[bool]
Whether the distance matrix should be saved and returned as output (default False)
- ged_params: Union[dict, None]
It contains the optional parameters for ged calculations; if it is not provided, the default parameters are used (default None)
- parallelization: Optional[bool]
Whether parallelization should be used for computing distance matrix (default False)
- n_cpu: Union[int, None]
If parallelization is activated, it indicates the number of CPUs to be used (default ‘mp.cpu_count()’)
- **kwargs:
The optional parameters specific of the selected clustering algorithm
Returns:¶
- clustering, score, (dist_matrix): tuple
The clustering algorithm output, the silhouette score and the distance matrix (save_dist_matrix=True)
Raises:¶
SingleRouteClustering: if the input list contains less than 2 routes
UnavailableClusteringAlgorithm: if the selected clustering algorithm is not available
Example:¶
>>> graph = json.loads(open('az_file.json').read()) >>> syngraphs = [translator('az_retro', g, 'syngraph', out_data_model='monopartite_reactions') for g in graph] >>> cluster1, score1 = clusterer(syngraphs, >>> ged_method='nx_optimized_ged', >>> clustering_method='agglomerative_cluster')