Adding graph similarity algorithms¶
The graph_distance module stores all the classes and functions to calculate the
‘similarity’ between routes, so if you would like to add a new algorithm, this is the module you will need to modify.
Below, we firstly give a brief description of the module architecture and then we show a practical example
for including a new algorithm.
graph_distance overview¶
The module is composed by a factory structure in which the subclasses of the abstract class
Ged implement the concrete Graph Edit Distance (GED)
calculators. For each subclass the concrete implementation the abstract method
compute_ged() is developed.
The GedFactory handles the calls to the correct Ged subclasses based on the user’s input.
The factory is wrapped by the facade function graph_distance_factory().
It takes two graph objects, the ‘name’ of the GED algorithm to be used and a dictionary containing
the optional parameters for the chemical similarity calculations. The latter are used in the ‘cost
functions’ for substituting two nodes of the GED algorithms; in this way, the chemical
information is introduced in the GED algorithm.
In the graph_distance are also implemented two ‘cost functions’
that return the ‘distance’ between two nodes of the same type as (1 - chemical similarity)
Implementing a new GED algorithm¶
In order to include a new GED algorithm among those available in LinChemIn, you
firstly need to create a new subclass of the abstract class
Ged in the graph_distance module and
implement its concrete compute_ged() method.
class CustomGEDAlgorithm(Ged)
""" Subclass of Ged implementing the CustomGEDAlgorithm. """
def compute_ged(self, syngraph1, syngraph2, reaction_fp, reaction_fp_params,
reaction_similarity_name, molecular_fingerprint, molecular_fp_params,
molecular_fp_count_vect, molecular_similarity_name):
# some super cool code
return ged
The last step is to add the ‘name’ of your algorithm to the available_ged dictionary of the
GedFactory, to make it available to the factory.
available_ged = {
'nx_ged': {'value': GedNx,
'info': 'Standard NetworkX GED algorithm. The "root" argument is used'},
'nx_ged_matrix': {'value': GedNxPrecomputedMatrix,
'info': 'Standard NetworkX GED algorithm. The distance matrix is computed in advance'
'and the "root" algorithm is used'},
'nx_optimized_ged': {'value': GedOptNx,
'info': 'Optimized NetworkX GED algorithm'},
'new_ged': {'value': CustomGEDAlgorithm,
'info': 'Brief description that will appear in the helper function'},
}
You can now compute your newly developed descriptor by calling the
graph_distance_factory() function:
from linchemin.rem.graph_distance import graph_distance_factory
ged = graph_distance_factory(syngraphs1, syngraphs2, ged_method='new_ged')
The newly developed algorithm for computing the GED (or another ‘similarity’ measure) is also
automatically available to the facade() function, from which
you will be able to work with a list of routes:
from linchemin.interfaces.facade import facade
dist_matrix, metadata = facade('distance_matrix', routes_list, ged_method='new_ged')