Atom-to-Atom Mapping

Being able to perform the atom-to-atom mapping of chemical reactions is fundamental to correctly assign the role (“reactant”, “reagent”, “product”) to each of the involved chemical compounds. This, in turns, allows us to correctly identify identical routes and to compute sophisticated chemistry-aware route metrics. However, many of the existing tools to perform the atom mapping have dependencies potentially leading to conflicts or are proprietary, requiring authentication and a licence incompatible with the MIT one.

To minimize the potential conflicts, we preferred to wrap these tools into containers and to expose their functionalities via REST APIs. A simple SDK operating the endpoints of each service API is also provided as installable package to simplify the usage of the services from python code.

The containerized atom-to-atom mapping tools are part of our linchemin_services repository, freely available at https://github.com/syngenta/linchemin_services, where you can also find the documentation for their installation and usage. Here we only describe how these tools are used within LinChemIn.

atom_mapping overview

The atom_mapping module stores all the classes and functions to interact with the REST APIs of the containerized atom mapping tools.

The module is composed of a factory structure in which the subclasses of the abstract class Mapper implement the concrete mappers. For each subclass the concrete implementation of the abstract method map_chemical_equations() is developed: it sets up the connection with the url relative to the selected mapper, prepares the input in the suitable format, submits the request and retrieves the output. The returned object is an instance of the MappingOutput class. The calls to the correct Mapper subclass based on the user’s input is handled by the MapperFactory class.

The factory is wrapped by the facade function perform_atom_mapping(), which takes as input a list of dictionaries containing the reaction strings to be mapped and the name of the selected mapper. Below is shown an example of its usage:

from linchemin.cheminfo.atom_mapping import perform_atom_mapping
# The RXNmapper is used
output = perform_atom_mapping(reaction_list, 'rxmapper')

Here output is an instance of the MappingOutput class. Its attribute mapped_reactions contains a list of dictionaries, one for each successfully mapped reactions, in the form [{‘query_id’: n, ‘output_string’: mapped_reaction}]; the attribute unmapped_reactions contains the list of input queries that have not been mapped (if any). The success_rate property is a float between 0 and 1 indicating the percentage of input queries that was mapped.

Atom mapping in ChemicalEquation instances

When a mapped smiles or a mapped RDKit ChemicalReaction object are used to instantiate a new ChemicalEquation object, an instance of the Ratam class is generated. The latter contains all the information related to the atom-to-atom mapping of the ChemicalEquation.

The full_map_info attribute of Ratam is a dictionary whose keys are identifiers of the Molecule objects involved in the reaction and the values are lists of “mapping dictionaries” in the form {atom_id: atom_map_number}. In this way we can keep track also of molecules that appear more than once in the reaction with different atom mapping. While building this attribute, a sanity check of the mapping is performed, by making sure that each map number connects only 2 atoms; if this is not the case, the mapping is considered invalid and an error is raised.

The second attribute of the Ratam object is the atom_transformation list. The latter is a list of AtomTransformation namedtuples, each of which contains a map number, the ids of atoms connected by the map number and the unique identifiers of the Molecule objects to which the atoms belong.

The Ratam instance is then assigned to the mapping attribute of the ChemicalEquation object.

You can find more information and examples about the usage of the atom mapping machinery in the tutorial.