graphein.molecule#

Config#

Base Config object for use with Molecule Graph Construction.

graphein.molecule.config.GraphAtoms#

Allowable atom types for nodes in the graph.

alias of Literal[‘C’, ‘H’, ‘O’, ‘N’, ‘F’, ‘P’, ‘S’, ‘Cl’, ‘Br’, ‘I’, ‘B’]

class graphein.molecule.config.MoleculeGraphConfig(*, verbose: bool = False, add_hs: bool = False, edge_construction_functions: typing.List[typing.Union[typing.Callable, str]] = [<function add_fully_connected_edges>, <function add_k_nn_edges>, <function add_distance_threshold>, <function add_atom_bonds>], node_metadata_functions: typing.List[typing.Union[typing.Callable, str]] = [<function atom_type_one_hot>], edge_metadata_functions: typing.List[typing.Union[typing.Callable, str]] = None, graph_metadata_functions: typing.List[typing.Callable] = None)[source]#

Config Object for Molecule Structure Graph Construction.

Parameters
  • verbose (bool) – Specifies verbosity of graph creation process.

  • add_hs (bool) – Specifies whether hydrogens should be added to the graph.

  • edge_construction_functions (List[Callable]) – List of functions that take an nx.Graph and return an nx.Graph with desired edges added. Prepared edge constructions can be found in graphein.protein.edges

  • node_metadata_functions (List[Callable], optional) – List of functions that take an nx.Graph

  • edge_metadata_functions (List[Callable], optional) – List of functions that take an

  • graph_metadata_functions (List[Callable], optional) – List of functions that take an nx.Graph and return an nx.Graph with added graph-level features and metadata.

Graphs#

Functions for working with Small Molecule Graphs.

graphein.molecule.graphs.add_nodes_to_graph(G: networkx.classes.graph.Graph, verbose: bool = False) networkx.classes.graph.Graph[source]#

Add nodes into molecule graph.

Parameters
  • G (nx.Graph) – nx.Graph with metadata to populate with nodes.

  • verbose (bool) – Controls verbosity of this step.

Returns

nx.Graph with nodes added.

Return type

nx.Graph

graphein.molecule.graphs.construct_graph(config: Optional[graphein.molecule.config.MoleculeGraphConfig] = None, sdf_path: Optional[str] = None, smiles: Optional[str] = None, mol2_path: Optional[str] = None, pdb_path: Optional[str] = None, edge_construction_funcs: Optional[str] = None, edge_annotation_funcs: Optional[List[Callable]] = None, node_annotation_funcs: Optional[List[Callable]] = None, graph_annotation_funcs: Optional[List[Callable]] = None) networkx.classes.graph.Graph[source]#

Constructs protein structure graph from a sdf_path, mol2_path or smiles.

Users can provide a MoleculeGraphConfig object to specify construction parameters.

However, config parameters can be overridden by passing arguments directly to the function.

Parameters
  • config (graphein.molecule.config.MoleculeGraphConfig, optional) – MoleculeGraphConfig object. If None, defaults to config in graphein.molecule.config.

  • sdf_path (str, optional) – Path to sdf_file to build graph from. Default is None.

  • smiles (str, optional) – smiles string to build graph from. Default is None.

  • mol2_path (str, optional) – Path to mol2_file to build graph from. Default is None.

  • pdb_path (str, optional) – Path to pdb_file to build graph from. Default is None.

  • edge_construction_funcs (List[Callable], optional) – List of edge construction functions. Default is None.

  • edge_annotation_funcs (List[Callable], optional) – List of edge annotation functions. Default is None.

  • node_annotation_funcs (List[Callable], optional) – List of node annotation functions. Default is None.

  • graph_annotation_funcs (List[Callable]) – List of graph annotation function. Default is None.

Returns

Molecule Structure Graph

Type

nx.Graph

graphein.molecule.graphs.initialise_graph_with_metadata(name: str, rdmol: rdkit.Mol, coords: np.ndarray) nx.Graph[source]#

Initializes the nx Graph object with initial metadata.

Parameters
  • name (str) – Name of the molecule. Either the smiles or filename depending on how the graph was created.

  • rdmol (rdkit.Mol) – Processed Dataframe of molecule structure.

Returns

Returns initial molecule structure graph with metadata.

Return type

nx.Graph

Edges#

Distance#

Functions for computing biochemical edges of graphs.

graphein.molecule.edges.distance.add_distance_threshold(G: networkx.classes.graph.Graph, threshold: float = 5.0)[source]#

Adds edges to any nodes within a given distance of each other.

Parameters
  • G (nx.Graph) – molecule structure graph to add distance edges to

  • threshold (float) – Distance in angstroms, below which two nodes are connected.

Returns

Graph with distance-based edges added

graphein.molecule.edges.distance.add_fully_connected_edges(G: networkx.classes.graph.Graph)[source]#

Adds fully connected edges to nodes.

Parameters

G (nx.Graph) – Molecule structure graph to add distance edges to.

graphein.molecule.edges.distance.add_k_nn_edges(G: networkx.classes.graph.Graph, k: int = 1, mode: str = 'connectivity', metric: str = 'minkowski', p: int = 2, include_self: Union[bool, str] = False)[source]#

Adds edges to nodes based on K nearest neighbours.

Parameters
  • G (nx.Graph) – Molecule structure graph to add distance edges to.

  • k (int) – Number of neighbors for each sample.

  • mode (str) – Type of returned matrix: "connectivity" will return the connectivity matrix with ones and zeros, and "distance" will return the distances between neighbors according to the given metric.

  • metric (str) – The distance metric used to calculate the k-Neighbors for each sample point. The DistanceMetric class gives a list of available metrics. The default distance is "euclidean" ("minkowski" metric with the p param equal to 2).

  • p (int) – Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. Default is 2 (euclidean).

  • include_self (Union[bool, str]) – Whether or not to mark each sample as the first nearest neighbor to itself. If "auto", then True is used for mode="connectivity" and False for mode="distance". Default is False.

Returns

Graph with knn-based edges added.

Return type

nx.Graph

graphein.molecule.edges.distance.compute_distmat(coords: numpy.ndarray) numpy.ndarray[source]#

Compute pairwise euclidean distances between every atom.

Design choice: passed in a DataFrame to enable easier testing on dummy data.

Parameters

coords (pd.DataFrame) – pd.Dataframe containing molecule structure. Must contain columns ["x_coord", "y_coord", "z_coord"].

Returns

np.ndarray of euclidean distance matrix.

Return type

np.ndarray

graphein.molecule.edges.distance.get_interacting_atoms(angstroms: float, distmat: numpy.ndarray) numpy.ndarray[source]#

Find the atoms that are within a particular radius of one another.

Parameters
  • angstroms (float) – Radius in angstroms.

  • distmat (np.ndarray) – Distance matrix.

Returns

Array of interacting atoms

Return type

np.ndarray

Atomic#

Functions for computing atomic structure of molecules.

graphein.molecule.edges.atomic.add_atom_bonds(G: networkx.classes.graph.Graph) networkx.classes.graph.Graph[source]#

Adds atomic bonds to a molecular graph.

Parameters

G (nx.Graph) – Molecular graph to add atomic bond edges to.

Returns

Molecular graph with atomic bonds added.

Return type

nx.Graph

Features#

Node#

Functions for featurising Small Molecule Graphs.

graphein.molecule.features.nodes.atom_type.atom_type_one_hot(n, d: Dict[str, Any], return_array: bool = True, allowable_set: Optional[List[str]] = None) numpy.ndarray[source]#

Adds a one-hot encoding of atom types as a node attribute.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data.

  • return_array (bool) – If True, returns a numpy np.ndarray of one-hot encoding, otherwise returns a pd.Series. Default is True.

  • allowable_set – Specifies vocabulary of amino acids. Default is None (which uses graphein.molecule.atoms.BASE_ATOMS).

Returns

One-hot encoding of amino acid types.

Return type

Union[pd.Series, np.ndarray]

graphein.molecule.features.nodes.atom_type.atomic_mass(n: str, d: Dict[str, Any]) float[source]#

Adds mass of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Mass of the atom.

Return type

float

graphein.molecule.features.nodes.atom_type.chiral_tag(n: str, d: Dict[str, Any]) rdkit.Chem.rdchem.ChiralType[source]#

Adds indicator of atom chirality to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Indicator of atom chirality.

Return type

rdkit.Chem.rdchem.ChiralType

graphein.molecule.features.nodes.atom_type.degree(n: str, d: Dict[str, Any]) int[source]#

Adds the degree of the node to the node data.

N.B. this is the degree as defined by RDKit rather than the ‘true’ degree of the node in the graph. For the latter, use nx.degree()

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Degree of the atom.

Return type

int

graphein.molecule.features.nodes.atom_type.explicit_valence(n: str, d: Dict[str, Any]) int[source]#

Adds explicit valence of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Explicit valence of the atom.

Return type

int

graphein.molecule.features.nodes.atom_type.formal_charge(n: str, d: Dict[str, Any]) int[source]#

Adds the formal charge of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Formal charge of the atom.

Return type

int

graphein.molecule.features.nodes.atom_type.hybridization(n: str, d: Dict[str, Any]) rdkit.Chem.rdchem.HybridizationType[source]#

Adds the hybridization of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Hybridization of the atom.

Return type

rdkit.Chem.rdchem.HybridizationType

graphein.molecule.features.nodes.atom_type.implicit_valence(n: str, d: Dict[str, Any]) int[source]#

Adds implicit valence of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Implicit valence of the atom.

Return type

int

graphein.molecule.features.nodes.atom_type.is_aromatic(n: str, d: Dict[str, Any]) bool[source]#

Adds indicator of aromaticity of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Indicator of aromaticity of the atom.

Return type

bool

graphein.molecule.features.nodes.atom_type.is_isotope(n: str, d: Dict[str, Any]) int[source]#

Adds indicator of whether or not the atom is an isotope to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Indicator of whether or not the atom is an isotope.

Return type

int

graphein.molecule.features.nodes.atom_type.is_ring(n: str, d: Dict[str, Any]) bool[source]#

Adds indicator of ring membership of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Indicator of ring membership of the atom.

Return type

bool

graphein.molecule.features.nodes.atom_type.is_ring_size(n: str, d: Dict[str, Any], ring_size: int) bool[source]#

Adds indicator of ring membership of size ring_size of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data.

  • ring_size (int) – The size of the ring to look for.

Returns

Indicator of ring membership of size ring_size of the atom.

Return type

bool

graphein.molecule.features.nodes.atom_type.num_explicit_h(n: str, d: Dict[str, Any]) int[source]#

Adds the number of explicit Hydrogens of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Number of explicit Hydrogens of the atom.

Return type

int

graphein.molecule.features.nodes.atom_type.num_implicit_h(n: str, d: Dict[str, Any]) int[source]#

Adds the number of implicit Hydrogens of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Number of implicit Hydrogens of the atom.

Return type

int

graphein.molecule.features.nodes.atom_type.num_radical_electrons(n: str, d: Dict[str, Any]) int[source]#

Adds the number of radical electrons of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Number of radical electrons of the atom.

Return type

int

graphein.molecule.features.nodes.atom_type.total_degree(n: str, d: Dict[str, Any]) int[source]#

Adds the total degree of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data.

Returns

Total degree of the atom.

Return type

int

graphein.molecule.features.nodes.atom_type.total_num_h(n: str, d: Dict[str, Any]) int[source]#

Adds the total number of Hydrogens of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data

Returns

Total number of Hydrogens of the atom.

Return type

int

graphein.molecule.features.nodes.atom_type.total_valence(n: str, d: Dict[str, Any]) int[source]#

Adds the total valence of the atom to the node data.

Parameters
  • n (str) – Node name, this is unused and only included for compatibility with the other functions.

  • d (Dict[str, Any]) – Node data.

Returns

Total valence of the atom.

Return type

int

Edge#

Functions for computing atomic features for molecules.

graphein.molecule.features.edges.bonds.add_bond_type(u: str, v: str, d: Dict[str, Any]) rdkit.Chem.rdchem.BondType[source]#

Adds bond type as an edge feature to the graph.

Parameters
  • u (str) – First node in the edge.

  • v (str) – Second node in the edge.

  • d (Dict[str, Any]) – Dictionary of edge metadata.

Returns

Returns the bond type.

Return type

rdkit.Chem.rdchem.BondType

graphein.molecule.features.edges.bonds.bond_is_aromatic(u: str, v: str, d: Dict[str, Any]) bool[source]#

Adds indicator of aromaticity of a bond to the graph as an edge feature.

Parameters
  • u (str) – First node in the edge.

  • v (str) – Second node in the edge.

  • d (Dict[str, Any]) – Dictionary of edge metadata.

Returns

Returns indicator of aromaticity of bond.

Return type

bool

graphein.molecule.features.edges.bonds.bond_is_conjugated(u: str, v: str, d: Dict[str, Any]) bool[source]#

Adds indicator of conjugated bond to the graph as an edge feature.

Parameters
  • u (str) – First node in the edge.

  • v (str) – Second node in the edge.

  • d (Dict[str, Any]) – Dictionary of edge metadata.

Returns

Returns indicator of conjugated bond.

Return type

bool

graphein.molecule.features.edges.bonds.bond_is_in_ring(u: str, v: str, d: Dict[str, Any]) bool[source]#

Adds indicator of ring membership to the graph as an edge feature.

Parameters
  • u (str) – First node in the edge.

  • v (str) – Second node in the edge.

  • d (Dict[str, Any]) – Dictionary of edge metadata.

Returns

Returns indicator of ring membership of bond.

Return type

bool

graphein.molecule.features.edges.bonds.bond_is_in_ring_size(u: str, v: str, d: Dict[str, Any], ring_size: int) int[source]#

Adds indicator of ring membership of size ring_size to the graph as an edge feature.

Parameters
  • u (str) – First node in the edge.

  • v (str) – Second node in the edge.

  • d (Dict[str, Any]) – Dictionary of edge metadata.

  • ring_size (int) – Size of the ring to look for

Returns

Returns ring size of bond.

Return type

int

graphein.molecule.features.edges.bonds.bond_stereo(u: str, v: str, d: Dict[str, Any]) rdkit.Chem.rdchem.BondStereo[source]#

Adds bond stereo configuration as an edge feature to the graph.

Parameters
  • u (str) – First node in the edge.

  • v (str) – Second node in the edge.

  • d (Dict[str, Any]) – Dictionary of edge metadata.

Returns

Returns the bond stereo.

Return type

rdkit.Chem.rdchem.BondStereo

Graph#

Functions for featurising Small Molecule Graphs.

graphein.molecule.features.graph.molecule.mol_descriptors(g: networkx.classes.graph.Graph, descriptor_list: Optional[List[str]] = None, return_array: bool = False, return_series: bool = False) Union[numpy.ndarray, pandas.core.series.Series, Dict[str, Union[float, int]]][source]#

Adds global molecular descriptors to the graph.

Parameters
  • g (nx.Graph) – The graph to add the descriptors to.

  • descriptor_list (Optional[List[str]]) – The list of descriptors to add. If None, all descriptors are added.

  • return_array (bool) – If True, the descriptors are returned as a np.ndarray.

  • return_series – If True, the descriptors are returned as a pd.Series.

Returns

The descriptors as a dictionary (default) np.ndarray or pd.Series.

Return type

Union[np.ndarray, pd.Series, Dict[str, Union[float, int]]]

Visualisation#

Functions for featurising Small Molecule Graphs.

Plotting functions for molecules wrap the methods defined on protein graphs and provide sane defaults.

graphein.molecule.visualisation.plot_molecular_graph(G: nx.Graph, angle: int = 30, plot_title: Optional[str] = None, figsize: Tuple[int, int] = (10, 7), node_alpha: float = 0.7, node_size_min: float = 20.0, node_size_multiplier: float = 1, label_node_ids: bool = True, node_colour_map: plt.cm = <matplotlib.colors.ListedColormap object>, edge_color_map: plt.cm = <matplotlib.colors.ListedColormap object>, colour_nodes_by: str = 'element', colour_edges_by: str = 'kind', edge_alpha: float = 0.5, plot_style: str = 'ggplot', out_path: Optional[str] = None, out_format: str = '.png') Axes3D[source]#

Plots molecular graph in Axes3D.

Parameters
  • G (nx.Graph) – nx.Graph Protein Structure graph to plot.

  • angle (int) – View angle. Defaults to 30.

  • plot_title (str, optional) – Title of plot. Defaults to None.

  • figsize (Tuple[int, int]) – Size of figure, defaults to (10, 7).

  • node_alpha (float) – Controls node transparency, defaults to 0.7.

  • node_size_min (float) – Specifies node minimum size, defaults to 20.

  • node_size_multiplier (float) – Scales node size by a constant. Node sizes reflect degree. Defaults to 20.

  • label_node_ids (bool) – bool indicating whether or not to plot node_id labels. Defaults to True.

  • node_colour_map (plt.cm) – colour map to use for nodes. Defaults to plt.cm.plasma.

  • edge_color_map (plt.cm) – colour map to use for edges. Defaults to plt.cm.plasma.

  • colour_nodes_by (str) – Specifies how to colour nodes. "degree", "seq_position" or a node feature.

  • colour_edges_by (str) – Specifies how to colour edges. Currently only "kind" is supported.

  • edge_alpha (float) – Controls edge transparency. Defaults to 0.5.

  • plot_style (str) – matplotlib style sheet to use. Defaults to "ggplot".

  • out_path (str, optional) – If not none, writes plot to this location. Defaults to None (does not save).

  • out_format (str) – Fileformat to use for plot

Returns

matplotlib Axes3D object.

Return type

Axes3D

graphein.molecule.visualisation.plotly_molecular_graph(g: nx.Graph, plot_title: Optional[str] = None, figsize: Tuple[int, int] = (620, 650), node_alpha: float = 0.7, node_size_min: float = 20, node_size_multiplier: float = 1.0, label_node_ids: bool = True, node_color_map: plt.cm = <matplotlib.colors.ListedColormap object>, edge_color_map: plt.cm = <matplotlib.colors.ListedColormap object>, colour_nodes_by: str = 'element', colour_edges_by: str = 'kind') go.Figure[source]#

Plots molecular graph using plotly.

Parameters
  • G (nx.Graph) – nx.Graph Molecular graph to plot

  • plot_title (str, optional) – Title of plot, defaults to None.

  • figsize (Tuple[int, int]) – Size of figure, defaults to (620, 650).

  • node_alpha (float) – Controls node transparency, defaults to 0.7.

  • node_size_min (float) – Specifies node minimum size. Defaults to 20.0.

  • node_size_multiplier (float) – Scales node size by a constant. Node sizes reflect degree. Defaults to 1.0.

  • label_node_ids (bool) – bool indicating whether or not to plot node_id labels. Defaults to True.

  • node_colour_map (plt.cm) – colour map to use for nodes. Defaults to plt.cm.plasma.

  • edge_color_map (plt.cm) – colour map to use for edges. Defaults to plt.cm.plasma.

  • colour_nodes_by (str) – Specifies how to colour nodes. "degree", or a node feature. Defaults to "element".

  • colour_edges_by (str) – Specifies how to colour edges. Currently only "kind" is supported.

Returns

Plotly Graph Objects plot

Return type

go.Figure

Constants#

Author: Eric J. Ma, Arian Jamasb Purpose: This is a set of utility variables and functions related to small molecules that can be used across the Graphein project.

These include various collections of standard atom types used molecule-focussed ML

graphein.molecule.atoms.ALLOWED_BOND_TYPES: List[rdkit.Chem.rdchem.BondType] = [rdkit.Chem.rdchem.BondType.SINGLE, rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.AROMATIC]#

Vocabulary of allowed bondtypes.

graphein.molecule.atoms.ALLOWED_BOND_TYPE_TO_CHANNEL: Dict[rdkit.Chem.rdchem.BondType, int] = {rdkit.Chem.rdchem.BondType.SINGLE: 0, rdkit.Chem.rdchem.BondType.DOUBLE: 1, rdkit.Chem.rdchem.BondType.TRIPLE: 2, rdkit.Chem.rdchem.BondType.AROMATIC: 3}#

Mapping of bondtypes to integer values.

graphein.molecule.atoms.ALLOWED_DEGREES: List[int] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]#

Vocabulary of allowed atom degrees.

graphein.molecule.atoms.ALLOWED_HYBRIDIZATIONS: List[rdkit.Chem.rdchem.HybridizationType] = [rdkit.Chem.rdchem.HybridizationType.SP, rdkit.Chem.rdchem.HybridizationType.SP2, rdkit.Chem.rdchem.HybridizationType.SP3, rdkit.Chem.rdchem.HybridizationType.SP3D, rdkit.Chem.rdchem.HybridizationType.SP3D2]#

Vocabulary of allowed hybridizations.

graphein.molecule.atoms.ALLOWED_NUM_H: List[int] = [0, 1, 2, 3, 4]#

Vocabulary of allowed number of Hydrogens.

graphein.molecule.atoms.ALLOWED_VALENCES: List[int] = [0, 1, 2, 3, 4, 5, 6]#

Vocabulary of allowed atom valences.

graphein.molecule.atoms.ALL_BOND_TYPES: List[rdkit.Chem.rdchem.BondType] = [rdkit.Chem.rdchem.BondType.AROMATIC, rdkit.Chem.rdchem.BondType.DATIVE, rdkit.Chem.rdchem.BondType.DATIVEL, rdkit.Chem.rdchem.BondType.DATIVER, rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.FIVEANDAHALF, rdkit.Chem.rdchem.BondType.FOURANDAHALF, rdkit.Chem.rdchem.BondType.HEXTUPLE, rdkit.Chem.rdchem.BondType.HYDROGEN, rdkit.Chem.rdchem.BondType.IONIC, rdkit.Chem.rdchem.BondType.ONEANDAHALF, rdkit.Chem.rdchem.BondType.OTHER, rdkit.Chem.rdchem.BondType.QUADRUPLE, rdkit.Chem.rdchem.BondType.QUINTUPLE, rdkit.Chem.rdchem.BondType.SINGLE, rdkit.Chem.rdchem.BondType.THREEANDAHALF, rdkit.Chem.rdchem.BondType.THREECENTER, rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.TWOANDAHALF, rdkit.Chem.rdchem.BondType.UNSPECIFIED, rdkit.Chem.rdchem.BondType.ZERO]#

Vocabulary of all RDkit BondTypes.

graphein.molecule.atoms.ALL_BOND_TYPES_TO_CHANNEL: Dict[rdkit.Chem.rdchem.BondType, int] = {rdkit.Chem.rdchem.BondType.UNSPECIFIED: 19, rdkit.Chem.rdchem.BondType.SINGLE: 14, rdkit.Chem.rdchem.BondType.DOUBLE: 4, rdkit.Chem.rdchem.BondType.TRIPLE: 17, rdkit.Chem.rdchem.BondType.QUADRUPLE: 12, rdkit.Chem.rdchem.BondType.QUINTUPLE: 13, rdkit.Chem.rdchem.BondType.HEXTUPLE: 7, rdkit.Chem.rdchem.BondType.ONEANDAHALF: 10, rdkit.Chem.rdchem.BondType.TWOANDAHALF: 18, rdkit.Chem.rdchem.BondType.THREEANDAHALF: 15, rdkit.Chem.rdchem.BondType.FOURANDAHALF: 6, rdkit.Chem.rdchem.BondType.FIVEANDAHALF: 5, rdkit.Chem.rdchem.BondType.AROMATIC: 0, rdkit.Chem.rdchem.BondType.IONIC: 9, rdkit.Chem.rdchem.BondType.HYDROGEN: 8, rdkit.Chem.rdchem.BondType.THREECENTER: 16, rdkit.Chem.rdchem.BondType.DATIVE: 1, rdkit.Chem.rdchem.BondType.DATIVEL: 2, rdkit.Chem.rdchem.BondType.DATIVER: 3, rdkit.Chem.rdchem.BondType.OTHER: 11, rdkit.Chem.rdchem.BondType.ZERO: 20}#

Vocabulary of all RDkit BondTypes mapped to integer values.

graphein.molecule.atoms.ALL_STEREO_TO_CHANNEL: Dict[rdkit.Chem.rdchem.BondStereo, int] = {rdkit.Chem.rdchem.BondStereo.STEREONONE: 3, rdkit.Chem.rdchem.BondStereo.STEREOANY: 0, rdkit.Chem.rdchem.BondStereo.STEREOZ: 5, rdkit.Chem.rdchem.BondStereo.STEREOE: 2, rdkit.Chem.rdchem.BondStereo.STEREOCIS: 1, rdkit.Chem.rdchem.BondStereo.STEREOTRANS: 4}#

Vocabulary of all RDKit bond stereo types mapped to integer values.

graphein.molecule.atoms.ALL_STEREO_TYPES: List[rdkit.Chem.rdchem.BondStereo] = [rdkit.Chem.rdchem.BondStereo.STEREOANY, rdkit.Chem.rdchem.BondStereo.STEREOCIS, rdkit.Chem.rdchem.BondStereo.STEREOE, rdkit.Chem.rdchem.BondStereo.STEREONONE, rdkit.Chem.rdchem.BondStereo.STEREOTRANS, rdkit.Chem.rdchem.BondStereo.STEREOZ]#

Vocabulary of all RDKit bond stereo types.

graphein.molecule.atoms.BASE_ATOMS: List[str] = ['C', 'H', 'O', 'N', 'F', 'P', 'S', 'Cl', 'Br', 'I', 'B']#

Vocabulary of 11 standard atom types.

graphein.molecule.atoms.CHIRAL_TYPE: List[rdkit.Chem.rdchem.ChiralType] = [rdkit.Chem.rdchem.ChiralType.CHI_OTHER, rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CCW, rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CW, rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED]#

Vocabulary of all RDKit chiral types.

graphein.molecule.atoms.CHIRAL_TYPE_TO_CHANNEL: Dict[rdkit.Chem.rdchem.ChiralType, int] = {rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED: 3, rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CW: 2, rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CCW: 1, rdkit.Chem.rdchem.ChiralType.CHI_OTHER: 0}#

Vocabulary of all RDKit chiral types mapped to integer values.

graphein.molecule.atoms.EXTENDED_ATOMS = ['C', 'N', 'O', 'S', 'F', 'Si', 'P', 'Cl', 'Br', 'Mg', 'Na', 'Ca', 'Fe', 'As', 'Al', 'I', 'B', 'V', 'K', 'Tl', 'Yb', 'Sb', 'Sn', 'Ag', 'Pd', 'Co', 'Se', 'Ti', 'Zn', 'H', 'Li', 'Ge', 'Cu', 'Au', 'Ni', 'Cd', 'In', 'Mn', 'Zr', 'Cr', 'Pt', 'Hg', 'Pb', 'Unknown']#

Vocabulary of additional atom types.

graphein.molecule.atoms.RDKIT_MOL_DESCRIPTORS: List[str] = ['MaxEStateIndex', 'MinEStateIndex', 'MaxAbsEStateIndex', 'MinAbsEStateIndex', 'qed', 'MolWt', 'HeavyAtomMolWt', 'ExactMolWt', 'NumValenceElectrons', 'NumRadicalElectrons', 'MaxPartialCharge', 'MinPartialCharge', 'MaxAbsPartialCharge', 'MinAbsPartialCharge', 'FpDensityMorgan1', 'FpDensityMorgan2', 'FpDensityMorgan3', 'BCUT2D_MWHI', 'BCUT2D_MWLOW', 'BCUT2D_CHGHI', 'BCUT2D_CHGLO', 'BCUT2D_LOGPHI', 'BCUT2D_LOGPLOW', 'BCUT2D_MRHI', 'BCUT2D_MRLOW', 'BalabanJ', 'BertzCT', 'Chi0', 'Chi0n', 'Chi0v', 'Chi1', 'Chi1n', 'Chi1v', 'Chi2n', 'Chi2v', 'Chi3n', 'Chi3v', 'Chi4n', 'Chi4v', 'HallKierAlpha', 'Ipc', 'Kappa1', 'Kappa2', 'Kappa3', 'LabuteASA', 'PEOE_VSA1', 'PEOE_VSA10', 'PEOE_VSA11', 'PEOE_VSA12', 'PEOE_VSA13', 'PEOE_VSA14', 'PEOE_VSA2', 'PEOE_VSA3', 'PEOE_VSA4', 'PEOE_VSA5', 'PEOE_VSA6', 'PEOE_VSA7', 'PEOE_VSA8', 'PEOE_VSA9', 'SMR_VSA1', 'SMR_VSA10', 'SMR_VSA2', 'SMR_VSA3', 'SMR_VSA4', 'SMR_VSA5', 'SMR_VSA6', 'SMR_VSA7', 'SMR_VSA8', 'SMR_VSA9', 'SlogP_VSA1', 'SlogP_VSA10', 'SlogP_VSA11', 'SlogP_VSA12', 'SlogP_VSA2', 'SlogP_VSA3', 'SlogP_VSA4', 'SlogP_VSA5', 'SlogP_VSA6', 'SlogP_VSA7', 'SlogP_VSA8', 'SlogP_VSA9', 'TPSA', 'EState_VSA1', 'EState_VSA10', 'EState_VSA11', 'EState_VSA2', 'EState_VSA3', 'EState_VSA4', 'EState_VSA5', 'EState_VSA6', 'EState_VSA7', 'EState_VSA8', 'EState_VSA9', 'VSA_EState1', 'VSA_EState10', 'VSA_EState2', 'VSA_EState3', 'VSA_EState4', 'VSA_EState5', 'VSA_EState6', 'VSA_EState7', 'VSA_EState8', 'VSA_EState9', 'FractionCSP3', 'HeavyAtomCount', 'NHOHCount', 'NOCount', 'NumAliphaticCarbocycles', 'NumAliphaticHeterocycles', 'NumAliphaticRings', 'NumAromaticCarbocycles', 'NumAromaticHeterocycles', 'NumAromaticRings', 'NumHAcceptors', 'NumHDonors', 'NumHeteroatoms', 'NumRotatableBonds', 'NumSaturatedCarbocycles', 'NumSaturatedHeterocycles', 'NumSaturatedRings', 'RingCount', 'MolLogP', 'MolMR', 'fr_Al_COO', 'fr_Al_OH', 'fr_Al_OH_noTert', 'fr_ArN', 'fr_Ar_COO', 'fr_Ar_N', 'fr_Ar_NH', 'fr_Ar_OH', 'fr_COO', 'fr_COO2', 'fr_C_O', 'fr_C_O_noCOO', 'fr_C_S', 'fr_HOCCN', 'fr_Imine', 'fr_NH0', 'fr_NH1', 'fr_NH2', 'fr_N_O', 'fr_Ndealkylation1', 'fr_Ndealkylation2', 'fr_Nhpyrrole', 'fr_SH', 'fr_aldehyde', 'fr_alkyl_carbamate', 'fr_alkyl_halide', 'fr_allylic_oxid', 'fr_amide', 'fr_amidine', 'fr_aniline', 'fr_aryl_methyl', 'fr_azide', 'fr_azo', 'fr_barbitur', 'fr_benzene', 'fr_benzodiazepine', 'fr_bicyclic', 'fr_diazo', 'fr_dihydropyridine', 'fr_epoxide', 'fr_ester', 'fr_ether', 'fr_furan', 'fr_guanido', 'fr_halogen', 'fr_hdrzine', 'fr_hdrzone', 'fr_imidazole', 'fr_imide', 'fr_isocyan', 'fr_isothiocyan', 'fr_ketone', 'fr_ketone_Topliss', 'fr_lactam', 'fr_lactone', 'fr_methoxy', 'fr_morpholine', 'fr_nitrile', 'fr_nitro', 'fr_nitro_arom', 'fr_nitro_arom_nonortho', 'fr_nitroso', 'fr_oxazole', 'fr_oxime', 'fr_para_hydroxylation', 'fr_phenol', 'fr_phenol_noOrthoHbond', 'fr_phos_acid', 'fr_phos_ester', 'fr_piperdine', 'fr_piperzine', 'fr_priamide', 'fr_prisulfonamd', 'fr_pyridine', 'fr_quatN', 'fr_sulfide', 'fr_sulfonamd', 'fr_sulfone', 'fr_term_acetylene', 'fr_tetrazole', 'fr_thiazole', 'fr_thiocyan', 'fr_thiophene', 'fr_unbrch_alkane', 'fr_urea']#

Vocabulary of easy-to-compute RDKit molecule descriptors