Adding clustering algorithms

The clustering module stores all the classes and functions to perform routes clustering, so if you would like to add a new algorithm, this is the module you will need to modify. Below, we firstly give a brief description of the module architecture and then we show a practical example for including a new algorithm.

clustering overview

The module is composed by a factory structure. The subclasses of the abstract class ClusterCalculator implement the concrete cluster calculators for applying the clustering algorithms. For each subclass the concrete implementation of the abstract method get_clustering() is developed. The ClusterFactory class handles the calls to the correct ClusterCalculator subclass based on the user’s input.

The factory is wrapped by the facade function clusterer(). It takes a list of graph objects, the ‘name’ of the clustering algorithm that should be used and a series of parameters related to the molecular and reaction fingerprints and to the chemical similarity calculation to be used.

Implementing a new clustering algorithm

In order to include a new clustering algorithm among those available in LinChemIn, you firstly need to create a new subclass of the abstract class ClusterCalculator in the clustering module and implement its concrete get_clustering() method.

class CustomClusteringAlgorithm(ClusterCalculator)
""" Subclass of ClusterCalculator applying the CustomClusteringAlgorithm. """
    def get_clustering(self, dist_matrix, save_dist_matrix, **kwargs):
        # some super cool code
        return (clustering, dist_matrix) if save_dist_matrix == True else clustering

The last step is to add the ‘name’ of your algorithm to the available_clustering_algorithms dictionary, to make it available to the factory.

available_clustering_algorithms = {
    'hdbscan': {'value': HdbscanClusterCalculator(),
                'info': 'HDBscan algorithm. Not working with less than 15 routes'},
    'agglomerative_cluster': {'value': AgglomerativeClusterCalculator(),
                              'info': 'Agglomerative Clustering algorithm. '
                                      'The number of clusters is optimized '
                                      'computing the silhouette score'},
    'new_cluster': {'value': CustomClusteringAlgorithm(),
                    'info': 'Brief description that will appear in the helper function'},
}

You can now use your newly developed clustering algorithm by calling the clusterer() function:

from linchemin.rem.clustering import clusterer

cluster1, matrix = clusterer(syngraphs, ged_method='nx_ged',
                             clustering_method='new_cluster')

Your new clustering algorithm can also be used through the facade() function:

from linchemin.interfaces.facade import facade

cluster, metadata = facade('clustering', routes_list, clustering_method='new_cluster')