graphein.molecule#
Config#
Base Config object for use with Molecule Graph Construction.
- graphein.molecule.config.GraphAtoms#
Allowable atom types for nodes in the graph.
alias of
Literal
[‘C’, ‘H’, ‘O’, ‘N’, ‘F’, ‘P’, ‘S’, ‘Cl’, ‘Br’, ‘I’, ‘B’]
- class graphein.molecule.config.MoleculeGraphConfig(*, verbose: bool = False, add_hs: bool = False, edge_construction_functions: typing.List[typing.Union[typing.Callable, str]] = [<function add_fully_connected_edges>, <function add_k_nn_edges>, <function add_distance_threshold>, <function add_atom_bonds>], node_metadata_functions: typing.List[typing.Union[typing.Callable, str]] = [<function atom_type_one_hot>], edge_metadata_functions: typing.List[typing.Union[typing.Callable, str]] = None, graph_metadata_functions: typing.List[typing.Callable] = None)[source]#
Config Object for Molecule Structure Graph Construction.
- Parameters
verbose (bool) – Specifies verbosity of graph creation process.
add_hs (bool) – Specifies whether hydrogens should be added to the graph.
edge_construction_functions (List[Callable]) – List of functions that take an
nx.Graph
and return annx.Graph
with desired edges added. Prepared edge constructions can be found in graphein.protein.edgesnode_metadata_functions (List[Callable], optional) – List of functions that take an
nx.Graph
edge_metadata_functions (List[Callable], optional) – List of functions that take an
graph_metadata_functions (List[Callable], optional) – List of functions that take an
nx.Graph
and return annx.Graph
with added graph-level features and metadata.
Graphs#
Functions for working with Small Molecule Graphs.
- graphein.molecule.graphs.add_nodes_to_graph(G: networkx.classes.graph.Graph, verbose: bool = False) networkx.classes.graph.Graph [source]#
Add nodes into molecule graph.
- Parameters
G (nx.Graph) –
nx.Graph
with metadata to populate with nodes.verbose (bool) – Controls verbosity of this step.
- Returns
nx.Graph with nodes added.
- Return type
nx.Graph
- graphein.molecule.graphs.construct_graph(config: Optional[graphein.molecule.config.MoleculeGraphConfig] = None, sdf_path: Optional[str] = None, smiles: Optional[str] = None, mol2_path: Optional[str] = None, pdb_path: Optional[str] = None, edge_construction_funcs: Optional[str] = None, edge_annotation_funcs: Optional[List[Callable]] = None, node_annotation_funcs: Optional[List[Callable]] = None, graph_annotation_funcs: Optional[List[Callable]] = None) networkx.classes.graph.Graph [source]#
Constructs protein structure graph from a
sdf_path
,mol2_path
orsmiles
.Users can provide a
MoleculeGraphConfig
object to specify construction parameters.However, config parameters can be overridden by passing arguments directly to the function.
- Parameters
config (graphein.molecule.config.MoleculeGraphConfig, optional) –
MoleculeGraphConfig
object. If None, defaults to config ingraphein.molecule.config
.sdf_path (str, optional) – Path to
sdf_file
to build graph from. Default isNone
.smiles (str, optional) – smiles string to build graph from. Default is
None
.mol2_path (str, optional) – Path to
mol2_file
to build graph from. Default isNone
.pdb_path (str, optional) – Path to
pdb_file
to build graph from. Default isNone
.edge_construction_funcs (List[Callable], optional) – List of edge construction functions. Default is
None
.edge_annotation_funcs (List[Callable], optional) – List of edge annotation functions. Default is
None
.node_annotation_funcs (List[Callable], optional) – List of node annotation functions. Default is
None
.graph_annotation_funcs (List[Callable]) – List of graph annotation function. Default is
None
.
- Returns
Molecule Structure Graph
- Type
nx.Graph
- graphein.molecule.graphs.initialise_graph_with_metadata(name: str, rdmol: rdkit.Mol, coords: np.ndarray) nx.Graph [source]#
Initializes the nx Graph object with initial metadata.
- Parameters
name (str) – Name of the molecule. Either the smiles or filename depending on how the graph was created.
rdmol (rdkit.Mol) – Processed Dataframe of molecule structure.
- Returns
Returns initial molecule structure graph with metadata.
- Return type
nx.Graph
Edges#
Distance#
Functions for computing biochemical edges of graphs.
- graphein.molecule.edges.distance.add_distance_threshold(G: networkx.classes.graph.Graph, threshold: float = 5.0)[source]#
Adds edges to any nodes within a given distance of each other.
- Parameters
G (nx.Graph) – molecule structure graph to add distance edges to
threshold (float) – Distance in angstroms, below which two nodes are connected.
- Returns
Graph with distance-based edges added
- graphein.molecule.edges.distance.add_fully_connected_edges(G: networkx.classes.graph.Graph)[source]#
Adds fully connected edges to nodes.
- Parameters
G (nx.Graph) – Molecule structure graph to add distance edges to.
- graphein.molecule.edges.distance.add_k_nn_edges(G: networkx.classes.graph.Graph, k: int = 1, mode: str = 'connectivity', metric: str = 'minkowski', p: int = 2, include_self: Union[bool, str] = False)[source]#
Adds edges to nodes based on K nearest neighbours.
- Parameters
G (nx.Graph) – Molecule structure graph to add distance edges to.
k (int) – Number of neighbors for each sample.
mode (str) – Type of returned matrix:
"connectivity"
will return the connectivity matrix with ones and zeros, and"distance"
will return the distances between neighbors according to the given metric.metric (str) – The distance metric used to calculate the k-Neighbors for each sample point. The DistanceMetric class gives a list of available metrics. The default distance is
"euclidean"
("minkowski"
metric with thep
param equal to2
).p (int) – Power parameter for the Minkowski metric. When
p = 1
, this is equivalent to usingmanhattan_distance
(l1), andeuclidean_distance
(l2) forp = 2
. For arbitraryp
,minkowski_distance
(l_p) is used. Default is2
(euclidean).include_self (Union[bool, str]) – Whether or not to mark each sample as the first nearest neighbor to itself. If
"auto"
, thenTrue
is used formode="connectivity"
andFalse
formode="distance"
. Default isFalse
.
- Returns
Graph with knn-based edges added.
- Return type
nx.Graph
- graphein.molecule.edges.distance.compute_distmat(coords: numpy.ndarray) numpy.ndarray [source]#
Compute pairwise euclidean distances between every atom.
Design choice: passed in a DataFrame to enable easier testing on dummy data.
- Parameters
coords (pd.DataFrame) – pd.Dataframe containing molecule structure. Must contain columns
["x_coord", "y_coord", "z_coord"]
.- Returns
np.ndarray of euclidean distance matrix.
- Return type
np.ndarray
- graphein.molecule.edges.distance.get_interacting_atoms(angstroms: float, distmat: numpy.ndarray) numpy.ndarray [source]#
Find the atoms that are within a particular radius of one another.
- Parameters
angstroms (float) – Radius in angstroms.
distmat (np.ndarray) – Distance matrix.
- Returns
Array of interacting atoms
- Return type
np.ndarray
Atomic#
Functions for computing atomic structure of molecules.
Features#
Node#
Functions for featurising Small Molecule Graphs.
- graphein.molecule.features.nodes.atom_type.atom_type_one_hot(n, d: Dict[str, Any], return_array: bool = True, allowable_set: Optional[List[str]] = None) numpy.ndarray [source]#
Adds a one-hot encoding of atom types as a node attribute.
- Parameters
n (str) – Node name, this is unused and only included for compatibility with the other functions.
d (Dict[str, Any]) – Node data.
return_array (bool) – If
True
, returns a numpynp.ndarray
of one-hot encoding, otherwise returns apd.Series
. Default isTrue
.allowable_set – Specifies vocabulary of amino acids. Default is
None
(which uses graphein.molecule.atoms.BASE_ATOMS).
- Returns
One-hot encoding of amino acid types.
- Return type
Union[pd.Series, np.ndarray]
- graphein.molecule.features.nodes.atom_type.atomic_mass(n: str, d: Dict[str, Any]) float [source]#
Adds mass of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.chiral_tag(n: str, d: Dict[str, Any]) rdkit.Chem.rdchem.ChiralType [source]#
Adds indicator of atom chirality to the node data.
- graphein.molecule.features.nodes.atom_type.degree(n: str, d: Dict[str, Any]) int [source]#
Adds the degree of the node to the node data.
N.B. this is the degree as defined by RDKit rather than the ‘true’ degree of the node in the graph. For the latter, use nx.degree()
- graphein.molecule.features.nodes.atom_type.explicit_valence(n: str, d: Dict[str, Any]) int [source]#
Adds explicit valence of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.formal_charge(n: str, d: Dict[str, Any]) int [source]#
Adds the formal charge of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.hybridization(n: str, d: Dict[str, Any]) rdkit.Chem.rdchem.HybridizationType [source]#
Adds the hybridization of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.implicit_valence(n: str, d: Dict[str, Any]) int [source]#
Adds implicit valence of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.is_aromatic(n: str, d: Dict[str, Any]) bool [source]#
Adds indicator of aromaticity of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.is_isotope(n: str, d: Dict[str, Any]) int [source]#
Adds indicator of whether or not the atom is an isotope to the node data.
- graphein.molecule.features.nodes.atom_type.is_ring(n: str, d: Dict[str, Any]) bool [source]#
Adds indicator of ring membership of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.is_ring_size(n: str, d: Dict[str, Any], ring_size: int) bool [source]#
Adds indicator of ring membership of size
ring_size
of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.num_explicit_h(n: str, d: Dict[str, Any]) int [source]#
Adds the number of explicit Hydrogens of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.num_implicit_h(n: str, d: Dict[str, Any]) int [source]#
Adds the number of implicit Hydrogens of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.num_radical_electrons(n: str, d: Dict[str, Any]) int [source]#
Adds the number of radical electrons of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.total_degree(n: str, d: Dict[str, Any]) int [source]#
Adds the total degree of the atom to the node data.
- graphein.molecule.features.nodes.atom_type.total_num_h(n: str, d: Dict[str, Any]) int [source]#
Adds the total number of Hydrogens of the atom to the node data.
Edge#
Functions for computing atomic features for molecules.
- graphein.molecule.features.edges.bonds.add_bond_type(u: str, v: str, d: Dict[str, Any]) rdkit.Chem.rdchem.BondType [source]#
Adds bond type as an edge feature to the graph.
- graphein.molecule.features.edges.bonds.bond_is_aromatic(u: str, v: str, d: Dict[str, Any]) bool [source]#
Adds indicator of aromaticity of a bond to the graph as an edge feature.
- graphein.molecule.features.edges.bonds.bond_is_conjugated(u: str, v: str, d: Dict[str, Any]) bool [source]#
Adds indicator of conjugated bond to the graph as an edge feature.
- graphein.molecule.features.edges.bonds.bond_is_in_ring(u: str, v: str, d: Dict[str, Any]) bool [source]#
Adds indicator of ring membership to the graph as an edge feature.
- graphein.molecule.features.edges.bonds.bond_is_in_ring_size(u: str, v: str, d: Dict[str, Any], ring_size: int) int [source]#
Adds indicator of ring membership of size
ring_size
to the graph as an edge feature.
Graph#
Functions for featurising Small Molecule Graphs.
- graphein.molecule.features.graph.molecule.mol_descriptors(g: networkx.classes.graph.Graph, descriptor_list: Optional[List[str]] = None, return_array: bool = False, return_series: bool = False) Union[numpy.ndarray, pandas.core.series.Series, Dict[str, Union[float, int]]] [source]#
Adds global molecular descriptors to the graph.
- Parameters
g (nx.Graph) – The graph to add the descriptors to.
descriptor_list (Optional[List[str]]) – The list of descriptors to add. If
None
, all descriptors are added.return_array (bool) – If
True
, the descriptors are returned as anp.ndarray
.return_series – If
True
, the descriptors are returned as apd.Series
.
- Returns
The descriptors as a dictionary (default)
np.ndarray
orpd.Series
.- Return type
Visualisation#
Functions for featurising Small Molecule Graphs.
Plotting functions for molecules wrap the methods defined on protein graphs and provide sane defaults.
- graphein.molecule.visualisation.plot_molecular_graph(G: nx.Graph, angle: int = 30, plot_title: Optional[str] = None, figsize: Tuple[int, int] = (10, 7), node_alpha: float = 0.7, node_size_min: float = 20.0, node_size_multiplier: float = 1, label_node_ids: bool = True, node_colour_map: plt.cm = <matplotlib.colors.ListedColormap object>, edge_color_map: plt.cm = <matplotlib.colors.ListedColormap object>, colour_nodes_by: str = 'element', colour_edges_by: str = 'kind', edge_alpha: float = 0.5, plot_style: str = 'ggplot', out_path: Optional[str] = None, out_format: str = '.png') Axes3D [source]#
Plots molecular graph in
Axes3D
.- Parameters
G (nx.Graph) – nx.Graph Protein Structure graph to plot.
angle (int) – View angle. Defaults to
30
.plot_title (str, optional) – Title of plot. Defaults to
None
.figsize (Tuple[int, int]) – Size of figure, defaults to
(10, 7)
.node_alpha (float) – Controls node transparency, defaults to
0.7
.node_size_min (float) – Specifies node minimum size, defaults to
20
.node_size_multiplier (float) – Scales node size by a constant. Node sizes reflect degree. Defaults to
20
.label_node_ids (bool) – bool indicating whether or not to plot
node_id
labels. Defaults toTrue
.node_colour_map (plt.cm) – colour map to use for nodes. Defaults to
plt.cm.plasma
.edge_color_map (plt.cm) – colour map to use for edges. Defaults to
plt.cm.plasma
.colour_nodes_by (str) – Specifies how to colour nodes.
"degree"
,"seq_position"
or a node feature.colour_edges_by (str) – Specifies how to colour edges. Currently only
"kind"
is supported.edge_alpha (float) – Controls edge transparency. Defaults to
0.5
.plot_style (str) – matplotlib style sheet to use. Defaults to
"ggplot"
.out_path (str, optional) – If not none, writes plot to this location. Defaults to
None
(does not save).out_format (str) – Fileformat to use for plot
- Returns
matplotlib Axes3D object.
- Return type
Axes3D
- graphein.molecule.visualisation.plotly_molecular_graph(g: nx.Graph, plot_title: Optional[str] = None, figsize: Tuple[int, int] = (620, 650), node_alpha: float = 0.7, node_size_min: float = 20, node_size_multiplier: float = 1.0, label_node_ids: bool = True, node_color_map: plt.cm = <matplotlib.colors.ListedColormap object>, edge_color_map: plt.cm = <matplotlib.colors.ListedColormap object>, colour_nodes_by: str = 'element', colour_edges_by: str = 'kind') go.Figure [source]#
Plots molecular graph using plotly.
- Parameters
G (nx.Graph) – nx.Graph Molecular graph to plot
plot_title (str, optional) – Title of plot, defaults to
None
.figsize (Tuple[int, int]) – Size of figure, defaults to
(620, 650)
.node_alpha (float) – Controls node transparency, defaults to
0.7
.node_size_min (float) – Specifies node minimum size. Defaults to
20.0
.node_size_multiplier (float) – Scales node size by a constant. Node sizes reflect degree. Defaults to
1.0
.label_node_ids (bool) – bool indicating whether or not to plot
node_id
labels. Defaults toTrue
.node_colour_map (plt.cm) – colour map to use for nodes. Defaults to
plt.cm.plasma
.edge_color_map (plt.cm) – colour map to use for edges. Defaults to
plt.cm.plasma
.colour_nodes_by (str) – Specifies how to colour nodes.
"degree"
, or a node feature. Defaults to"element"
.colour_edges_by (str) – Specifies how to colour edges. Currently only
"kind"
is supported.
- Returns
Plotly Graph Objects plot
- Return type
go.Figure
Constants#
Author: Eric J. Ma, Arian Jamasb Purpose: This is a set of utility variables and functions related to small molecules that can be used across the Graphein project.
These include various collections of standard atom types used molecule-focussed ML
- graphein.molecule.atoms.ALLOWED_BOND_TYPES: List[rdkit.Chem.rdchem.BondType] = [rdkit.Chem.rdchem.BondType.SINGLE, rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.AROMATIC]#
Vocabulary of allowed bondtypes.
- graphein.molecule.atoms.ALLOWED_BOND_TYPE_TO_CHANNEL: Dict[rdkit.Chem.rdchem.BondType, int] = {rdkit.Chem.rdchem.BondType.SINGLE: 0, rdkit.Chem.rdchem.BondType.DOUBLE: 1, rdkit.Chem.rdchem.BondType.TRIPLE: 2, rdkit.Chem.rdchem.BondType.AROMATIC: 3}#
Mapping of bondtypes to integer values.
- graphein.molecule.atoms.ALLOWED_DEGREES: List[int] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]#
Vocabulary of allowed atom degrees.
- graphein.molecule.atoms.ALLOWED_HYBRIDIZATIONS: List[rdkit.Chem.rdchem.HybridizationType] = [rdkit.Chem.rdchem.HybridizationType.SP, rdkit.Chem.rdchem.HybridizationType.SP2, rdkit.Chem.rdchem.HybridizationType.SP3, rdkit.Chem.rdchem.HybridizationType.SP3D, rdkit.Chem.rdchem.HybridizationType.SP3D2]#
Vocabulary of allowed hybridizations.
- graphein.molecule.atoms.ALLOWED_NUM_H: List[int] = [0, 1, 2, 3, 4]#
Vocabulary of allowed number of Hydrogens.
- graphein.molecule.atoms.ALLOWED_VALENCES: List[int] = [0, 1, 2, 3, 4, 5, 6]#
Vocabulary of allowed atom valences.
- graphein.molecule.atoms.ALL_BOND_TYPES: List[rdkit.Chem.rdchem.BondType] = [rdkit.Chem.rdchem.BondType.AROMATIC, rdkit.Chem.rdchem.BondType.DATIVE, rdkit.Chem.rdchem.BondType.DATIVEL, rdkit.Chem.rdchem.BondType.DATIVER, rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.FIVEANDAHALF, rdkit.Chem.rdchem.BondType.FOURANDAHALF, rdkit.Chem.rdchem.BondType.HEXTUPLE, rdkit.Chem.rdchem.BondType.HYDROGEN, rdkit.Chem.rdchem.BondType.IONIC, rdkit.Chem.rdchem.BondType.ONEANDAHALF, rdkit.Chem.rdchem.BondType.OTHER, rdkit.Chem.rdchem.BondType.QUADRUPLE, rdkit.Chem.rdchem.BondType.QUINTUPLE, rdkit.Chem.rdchem.BondType.SINGLE, rdkit.Chem.rdchem.BondType.THREEANDAHALF, rdkit.Chem.rdchem.BondType.THREECENTER, rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.TWOANDAHALF, rdkit.Chem.rdchem.BondType.UNSPECIFIED, rdkit.Chem.rdchem.BondType.ZERO]#
Vocabulary of all RDkit BondTypes.
- graphein.molecule.atoms.ALL_BOND_TYPES_TO_CHANNEL: Dict[rdkit.Chem.rdchem.BondType, int] = {rdkit.Chem.rdchem.BondType.UNSPECIFIED: 19, rdkit.Chem.rdchem.BondType.SINGLE: 14, rdkit.Chem.rdchem.BondType.DOUBLE: 4, rdkit.Chem.rdchem.BondType.TRIPLE: 17, rdkit.Chem.rdchem.BondType.QUADRUPLE: 12, rdkit.Chem.rdchem.BondType.QUINTUPLE: 13, rdkit.Chem.rdchem.BondType.HEXTUPLE: 7, rdkit.Chem.rdchem.BondType.ONEANDAHALF: 10, rdkit.Chem.rdchem.BondType.TWOANDAHALF: 18, rdkit.Chem.rdchem.BondType.THREEANDAHALF: 15, rdkit.Chem.rdchem.BondType.FOURANDAHALF: 6, rdkit.Chem.rdchem.BondType.FIVEANDAHALF: 5, rdkit.Chem.rdchem.BondType.AROMATIC: 0, rdkit.Chem.rdchem.BondType.IONIC: 9, rdkit.Chem.rdchem.BondType.HYDROGEN: 8, rdkit.Chem.rdchem.BondType.THREECENTER: 16, rdkit.Chem.rdchem.BondType.DATIVE: 1, rdkit.Chem.rdchem.BondType.DATIVEL: 2, rdkit.Chem.rdchem.BondType.DATIVER: 3, rdkit.Chem.rdchem.BondType.OTHER: 11, rdkit.Chem.rdchem.BondType.ZERO: 20}#
Vocabulary of all RDkit BondTypes mapped to integer values.
- graphein.molecule.atoms.ALL_STEREO_TO_CHANNEL: Dict[rdkit.Chem.rdchem.BondStereo, int] = {rdkit.Chem.rdchem.BondStereo.STEREONONE: 3, rdkit.Chem.rdchem.BondStereo.STEREOANY: 0, rdkit.Chem.rdchem.BondStereo.STEREOZ: 5, rdkit.Chem.rdchem.BondStereo.STEREOE: 2, rdkit.Chem.rdchem.BondStereo.STEREOCIS: 1, rdkit.Chem.rdchem.BondStereo.STEREOTRANS: 4}#
Vocabulary of all RDKit bond stereo types mapped to integer values.
- graphein.molecule.atoms.ALL_STEREO_TYPES: List[rdkit.Chem.rdchem.BondStereo] = [rdkit.Chem.rdchem.BondStereo.STEREOANY, rdkit.Chem.rdchem.BondStereo.STEREOCIS, rdkit.Chem.rdchem.BondStereo.STEREOE, rdkit.Chem.rdchem.BondStereo.STEREONONE, rdkit.Chem.rdchem.BondStereo.STEREOTRANS, rdkit.Chem.rdchem.BondStereo.STEREOZ]#
Vocabulary of all RDKit bond stereo types.
- graphein.molecule.atoms.BASE_ATOMS: List[str] = ['C', 'H', 'O', 'N', 'F', 'P', 'S', 'Cl', 'Br', 'I', 'B']#
Vocabulary of 11 standard atom types.
- graphein.molecule.atoms.CHIRAL_TYPE: List[rdkit.Chem.rdchem.ChiralType] = [rdkit.Chem.rdchem.ChiralType.CHI_OTHER, rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CCW, rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CW, rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED]#
Vocabulary of all RDKit chiral types.
- graphein.molecule.atoms.CHIRAL_TYPE_TO_CHANNEL: Dict[rdkit.Chem.rdchem.ChiralType, int] = {rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED: 3, rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CW: 2, rdkit.Chem.rdchem.ChiralType.CHI_TETRAHEDRAL_CCW: 1, rdkit.Chem.rdchem.ChiralType.CHI_OTHER: 0}#
Vocabulary of all RDKit chiral types mapped to integer values.
- graphein.molecule.atoms.EXTENDED_ATOMS = ['C', 'N', 'O', 'S', 'F', 'Si', 'P', 'Cl', 'Br', 'Mg', 'Na', 'Ca', 'Fe', 'As', 'Al', 'I', 'B', 'V', 'K', 'Tl', 'Yb', 'Sb', 'Sn', 'Ag', 'Pd', 'Co', 'Se', 'Ti', 'Zn', 'H', 'Li', 'Ge', 'Cu', 'Au', 'Ni', 'Cd', 'In', 'Mn', 'Zr', 'Cr', 'Pt', 'Hg', 'Pb', 'Unknown']#
Vocabulary of additional atom types.
- graphein.molecule.atoms.RDKIT_MOL_DESCRIPTORS: List[str] = ['MaxEStateIndex', 'MinEStateIndex', 'MaxAbsEStateIndex', 'MinAbsEStateIndex', 'qed', 'MolWt', 'HeavyAtomMolWt', 'ExactMolWt', 'NumValenceElectrons', 'NumRadicalElectrons', 'MaxPartialCharge', 'MinPartialCharge', 'MaxAbsPartialCharge', 'MinAbsPartialCharge', 'FpDensityMorgan1', 'FpDensityMorgan2', 'FpDensityMorgan3', 'BCUT2D_MWHI', 'BCUT2D_MWLOW', 'BCUT2D_CHGHI', 'BCUT2D_CHGLO', 'BCUT2D_LOGPHI', 'BCUT2D_LOGPLOW', 'BCUT2D_MRHI', 'BCUT2D_MRLOW', 'BalabanJ', 'BertzCT', 'Chi0', 'Chi0n', 'Chi0v', 'Chi1', 'Chi1n', 'Chi1v', 'Chi2n', 'Chi2v', 'Chi3n', 'Chi3v', 'Chi4n', 'Chi4v', 'HallKierAlpha', 'Ipc', 'Kappa1', 'Kappa2', 'Kappa3', 'LabuteASA', 'PEOE_VSA1', 'PEOE_VSA10', 'PEOE_VSA11', 'PEOE_VSA12', 'PEOE_VSA13', 'PEOE_VSA14', 'PEOE_VSA2', 'PEOE_VSA3', 'PEOE_VSA4', 'PEOE_VSA5', 'PEOE_VSA6', 'PEOE_VSA7', 'PEOE_VSA8', 'PEOE_VSA9', 'SMR_VSA1', 'SMR_VSA10', 'SMR_VSA2', 'SMR_VSA3', 'SMR_VSA4', 'SMR_VSA5', 'SMR_VSA6', 'SMR_VSA7', 'SMR_VSA8', 'SMR_VSA9', 'SlogP_VSA1', 'SlogP_VSA10', 'SlogP_VSA11', 'SlogP_VSA12', 'SlogP_VSA2', 'SlogP_VSA3', 'SlogP_VSA4', 'SlogP_VSA5', 'SlogP_VSA6', 'SlogP_VSA7', 'SlogP_VSA8', 'SlogP_VSA9', 'TPSA', 'EState_VSA1', 'EState_VSA10', 'EState_VSA11', 'EState_VSA2', 'EState_VSA3', 'EState_VSA4', 'EState_VSA5', 'EState_VSA6', 'EState_VSA7', 'EState_VSA8', 'EState_VSA9', 'VSA_EState1', 'VSA_EState10', 'VSA_EState2', 'VSA_EState3', 'VSA_EState4', 'VSA_EState5', 'VSA_EState6', 'VSA_EState7', 'VSA_EState8', 'VSA_EState9', 'FractionCSP3', 'HeavyAtomCount', 'NHOHCount', 'NOCount', 'NumAliphaticCarbocycles', 'NumAliphaticHeterocycles', 'NumAliphaticRings', 'NumAromaticCarbocycles', 'NumAromaticHeterocycles', 'NumAromaticRings', 'NumHAcceptors', 'NumHDonors', 'NumHeteroatoms', 'NumRotatableBonds', 'NumSaturatedCarbocycles', 'NumSaturatedHeterocycles', 'NumSaturatedRings', 'RingCount', 'MolLogP', 'MolMR', 'fr_Al_COO', 'fr_Al_OH', 'fr_Al_OH_noTert', 'fr_ArN', 'fr_Ar_COO', 'fr_Ar_N', 'fr_Ar_NH', 'fr_Ar_OH', 'fr_COO', 'fr_COO2', 'fr_C_O', 'fr_C_O_noCOO', 'fr_C_S', 'fr_HOCCN', 'fr_Imine', 'fr_NH0', 'fr_NH1', 'fr_NH2', 'fr_N_O', 'fr_Ndealkylation1', 'fr_Ndealkylation2', 'fr_Nhpyrrole', 'fr_SH', 'fr_aldehyde', 'fr_alkyl_carbamate', 'fr_alkyl_halide', 'fr_allylic_oxid', 'fr_amide', 'fr_amidine', 'fr_aniline', 'fr_aryl_methyl', 'fr_azide', 'fr_azo', 'fr_barbitur', 'fr_benzene', 'fr_benzodiazepine', 'fr_bicyclic', 'fr_diazo', 'fr_dihydropyridine', 'fr_epoxide', 'fr_ester', 'fr_ether', 'fr_furan', 'fr_guanido', 'fr_halogen', 'fr_hdrzine', 'fr_hdrzone', 'fr_imidazole', 'fr_imide', 'fr_isocyan', 'fr_isothiocyan', 'fr_ketone', 'fr_ketone_Topliss', 'fr_lactam', 'fr_lactone', 'fr_methoxy', 'fr_morpholine', 'fr_nitrile', 'fr_nitro', 'fr_nitro_arom', 'fr_nitro_arom_nonortho', 'fr_nitroso', 'fr_oxazole', 'fr_oxime', 'fr_para_hydroxylation', 'fr_phenol', 'fr_phenol_noOrthoHbond', 'fr_phos_acid', 'fr_phos_ester', 'fr_piperdine', 'fr_piperzine', 'fr_priamide', 'fr_prisulfonamd', 'fr_pyridine', 'fr_quatN', 'fr_sulfide', 'fr_sulfonamd', 'fr_sulfone', 'fr_term_acetylene', 'fr_tetrazole', 'fr_thiazole', 'fr_thiocyan', 'fr_thiophene', 'fr_unbrch_alkane', 'fr_urea']#
Vocabulary of easy-to-compute RDKit molecule descriptors