Graph Data Structures¶
LinChemIn provides two internal data structures and relative methods to store and manipulate graphs:
Iron and SynGraph.
Iron¶
Iron is mainly a “service” class used as a sort of abstraction layer
between data structures. This is very useful for our translator() function,
in which the input graph is initially converted to an Iron instance and then transformed in the
selected output format.
The nodes of Iron are instances of the class Node, characterized by an id,
a dictionary of properties and a list of labels. Similarly, the edges are instances of the class
Edge, also characterized by an id, a dictionary of properties and a list of labels;
in addition, the Edge class has attributes storing the id of the nodes that
it connects and a direction. The direction of the edge is
itself an instance of another class, Direction, which stores
the id of the “parent” and “child” nodes.
An Iron instance is composed of two dictionaries.
The first one stores the nodes with their properties and labels,
in the form {id: Node}, while the other stores the edges with the relative information, also in the form
{id: Edge}.
When a new Iron instance is instantiated, an empty graph object is created;
it can then be populated by adding nodes and edges using the methods add_node() and
add_edge().
The code snippet below shows how to create and populate an Iron instance.
from linchemin.cgu.iron import Iron, Node, Edge, Direction
# The new Iron instance is initialized
iron_graph = Iron()
# Two nodes, instances of the Node class, are defined and added to the Iron instance
parent_node = Node(properties={'node_smiles': 'CCN', # "parent" node properties
'prop1': 'some_value'},
iid='0', # "parent" node id
labels=['parent_node']) # "parent" node labels
iron_graph.add_node(parent_node.iid, parent_node) # the "parent" node is added
child_node = Node(properties={'node_smiles': 'CCC(=O)NCC', # "child" node properties
'prop2': 'some_other_value'},
iid='1', # "child" node id
labels=['child_node']) # "child" node labels
iron_graph.add_node(child_node.iid, child_node) # the "child" node is added
# An edge, instance of the Edge class, connecting the two previously defined nodes, is added
d = Direction('{}>{}'.format(parent_node.iid, child_node.iid)) # edge direction is instantiated
edge = Edge(iid='0', # edge id
a_iid=parent_node.iid, # id of the "parent" node
b_iid=child_node.iid, # id of the "child" node
direction=d, # edge direction
properties={'some property': 'some value'}, # edge properties
labels=['some label']) # edge labels
iron_graph.add_edge(edge.iid, edge) # the edge is added
We recommend to add a ‘node_smiles’
key in the properties dictionary of the Iron nodes, so that the Iron object is suitable
to be translated into a SynGraph instance.
SynGraph¶
The abstract class SynGraph is
the implementation of the homonym data model and represents
the backbone of LinChemIn, being used as the underlying data structure for most of the code functions.
It contains a graph-like structure implemented as a dictionary of sets: the key encodes a “parent” node
having out edge(s), and the value is a python set containing all its “children” nodes.
Using a set ensures no duplicates among the “children” nodes.
While nodes are explicit, the edges stay implicit, and their direction is presumed to always be from
the “parent” node to the “children” nodes.
The subclasses of SynGraph are BipartiteSynGraph,
MonopartiteReacSynGraph and MonopartiteMolSynGraph,
each of which represents a specific data model.
An instance of any SynGraph can be initialized by passing an Iron instance
whose nodes have at least the property node_smiles. This allows the builder to construct
the instances of the Molecule or
ChemicalEquation classes, which will be the nodes
of the SynGraph object.
As an alternative, it is possible to pass a list of dictionaries containing reaction strings, such as SMILES,
in the form [{'query_id': reaction_id, 'output_string': reaction_string}].
The last option is to create an empty SynGraph instance
and add nodes using the method
add_node().
from linchemin.cgu.syngraph import BipartiteSynGraph, MonopartiteReacSynGraph, MonopartiteMolSynGraph
# A BipartiteSynGraph is initiated by passing a route in Iron format
bp_syngraph = BipartiteSynGraph(iron_route)
# A MonopartiteReacSynGraph is initiated by passing a list of dictionaries of reaction smiles
reactions_list =[{'query_id': 0, 'output_string': 'CC(=O)CC.CCN>>CC/N=C(C)\CC'},
{'query_id': 1, 'output_string': 'CC/N=C(C)\CC.N#CC[NaB]>>CCNC(C)CC'}]
mpr_syngraph = MonopartiteReacSynGraph(reactions_list)
# A MonopartiteMolSynGraph is initiated as an empty instance and then nodes are added
mpm_syngraph = MonopartiteMolSynGraph()
mpm_syngraph.add_node(('CCN', ['CCNC(=O)CC']))
mpm_syngraph.add_node(('CCOC(=O)CC', ['CCNC(=O)CC']))
In order to convert one type of SynGraph into another,
the converter() can be used.
from linchemin.cgu.convert import converter
# A BipartiteSynGraph is converted into a MonopartiteReacSynGraph
mpr_syngraph = converter(bp_syngraph, 'monopartite_reactions')
# A MonopartiteReacSynGraph is converted into a BipartiteSynGraph
bp_syngraph = converter(mpr_syngraphs, 'bipartite')