Graphein - Atom Graph Tutorial#

In this notebook, we’ll run through residue-level graph construction in Graphein. We start by discsussing the config, the high-level API and spend the bulk of the tutorial running through the low-level API. First, let’s take a look at our protein of interest: here’s one I prepared earlier ;)

Open In Colab

[1]:
# Install Graphein if necessary
# !pip install graphein

# Install pymol if necessary - in this tutorial PyMol is only used for the initial plot. Feel free to skip!
# sudo apt-get install pymol (recommended for colab) OR conda install -c schrodinger pymol
[2]:
# NBVAL_SKIP
from graphein.utils.pymol import MolViewer
pymol = MolViewer()
pymol.delete("all") # delete all objects from other sessions if necessary.
pymol.fetch('3eiy')
pymol.display()
[2]:
../_images/notebooks_atom_graph_tutorial_2_0.png

Config#

As per the residue graph tutorial, we first define a config object which parameterises the graph construction.

[3]:
from graphein.protein.config import ProteinGraphConfig

config = ProteinGraphConfig()
config.dict()
[3]:
{'granularity': 'CA',
 'keep_hets': False,
 'insertions': False,
 'pdb_dir': PosixPath('../examples/pdbs'),
 'verbose': False,
 'exclude_waters': True,
 'deprotonate': False,
 'protein_df_processing_functions': None,
 'edge_construction_functions': [<function graphein.protein.edges.distance.add_peptide_bonds(G: 'nx.Graph') -> 'nx.Graph'>],
 'node_metadata_functions': [<function graphein.protein.features.nodes.amino_acid.meiler_embedding(n, d, return_array: bool = False) -> Union[pandas.core.series.Series, <built-in function array>]>],
 'edge_metadata_functions': None,
 'graph_metadata_functions': None,
 'get_contacts_config': None,
 'dssp_config': None}

Let’s run through the config:

  • granularity: specifies the granularity of the graph (i.e. what should the nodes be). Possible values are: atom identifiers (e.g. "CA" for \(\alpha\) carbon, "CB" for \(\beta\) carbon), "centroid" to use residue centroids (under the hood, this is the same as “CA”, but we use the average x,y,z coordinates for the atoms in the residue) or "atom" for atom-level construction. This is discussed here.

  • keep_hets: this is a boolean specifying whether or not to keep heteroatoms present in the .pdb file. Heteroatoms are typically non-protein atoms (waters, metal ions, ligands) but can sometimes contain non-standard or modified residues.

  • insertions: boolean specifying whether or not to keep insertions in the PDB file

  • pdb_dir optional path to a folder in which to save pdb files. Otherwise, /tmp/ will be used

  • verbose: bool controlling amount of info printed

  • exclude_waters: not implemented

  • deprotonate: bool indicating whether or not to remove Hydrogen atoms

  • protein_df_processing_functions: list of functions with which to process the PDB dataframe. Discussed in the low-level API.

  • edge_construction_functions: list of functions to compute edges with

  • node_metadata_functions: list of functions to annotate nodes with

  • edge_metadata_functions: list of functions to annotate edges with

  • graph_meta_functions: list of functions to annotate graph with

  • get_contacts_config: A separate config object if using GetContacts edge construction functions

Config for Atom-level Graphs#

We’ll now make the necessary changes to compute an atomic graph.

[4]:
from graphein.protein.edges.atomic import add_atomic_edges
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges]}

config = ProteinGraphConfig(**params_to_change)
config.dict()
[4]:
{'granularity': 'atom',
 'keep_hets': False,
 'insertions': False,
 'pdb_dir': PosixPath('../examples/pdbs'),
 'verbose': False,
 'exclude_waters': True,
 'deprotonate': False,
 'protein_df_processing_functions': None,
 'edge_construction_functions': [<function graphein.protein.edges.atomic.add_atomic_edges(G: networkx.classes.graph.Graph, tolerance: float = 0.56) -> networkx.classes.graph.Graph>],
 'node_metadata_functions': [<function graphein.protein.features.nodes.amino_acid.meiler_embedding(n, d, return_array: bool = False) -> Union[pandas.core.series.Series, <built-in function array>]>],
 'edge_metadata_functions': None,
 'graph_metadata_functions': None,
 'get_contacts_config': None,
 'dssp_config': None}

High-level API#

Graphein features a high-level API which should be applicable for most simple graph constructions. This can be used on either a .pdb file (so you can run whatever pre-processing you wish), or we can provide a PDB accession code and retrieve a structure from the PDB itself. If a path is provided, it takes precedence over the PDB code.

To use it we do as follows:

[5]:
from graphein.protein.graphs import construct_graph

g = construct_graph(config=config, pdb_code="3eiy")
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1330 total nodes
DEBUG:graphein.protein.features.nodes.amino_acid:Reading meiler embeddings from: /Users/arianjamasb/github/graphein/graphein/protein/features/nodes/meiler_embeddings.csv

To use a local file, you can run:

g = construct_graph(config=config, pdb_path="../examples/pdbs/3eiy.pdb")
[6]:
from graphein.protein.visualisation import plotly_protein_structure_graph

p = plotly_protein_structure_graph(
    g,
    colour_edges_by="kind",
    colour_nodes_by="element_symbol",
    label_node_ids=False,
    node_size_min=5,
    node_alpha=0.85,
    node_size_multiplier=1,
    plot_title="Atom-level graph (PDB: 3eiy). Nodes coloured by their Element"
    )
p.show()

As you can see, all the bond types are the same. In order to add bond order assignment, we need to pass the add_bond_order function to the list of edge functions. We do the same for assigning ring status with add_ring_status

[7]:
from graphein.protein.edges.atomic import add_atomic_edges, add_bond_order, add_ring_status
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges, add_bond_order, add_ring_status]}

config = ProteinGraphConfig(**params_to_change)
g = construct_graph(config=config, pdb_code="3eiy")

p = plotly_protein_structure_graph(
    g,
    colour_edges_by="kind",
    colour_nodes_by="element_symbol",
    label_node_ids=False,
    node_size_min=5,
    node_alpha=0.85,
    node_size_multiplier=1,
    plot_title="Atom-level graph (PDB: 3eiy). Nodes coloured by their Element"
    )
p.show()

# Print some edges to verify
for i, (u, v, a) in enumerate(g.edges(data=True)):
    if i % 30 == 0:
        print(u, v, a)
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1330 total nodes
A:SER:2:N A:SER:2:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.4964471256947236}
A:ASN:5:CG A:ASN:5:OD1 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2476606109034614}
A:LYS:10:C A:ASP:11:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3200488627319846}
A:GLN:14:CA A:GLN:14:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5587077981456363}
A:ASN:17:CA A:ASN:17:CB {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5510625390357422}
A:GLU:21:CA A:GLU:21:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.541464563329304}
A:GLN:25:CA A:GLN:25:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5395742918092665}
A:PRO:28:CG A:PRO:28:CD {'kind': {'RING', 'SINGLE', 'covalent'}, 'bond_length': 1.5008284378968835}
A:GLU:32:N A:GLU:32:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.4628581612719675}
A:LYS:35:CE A:LYS:35:NZ {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5094432748533508}
A:LEU:40:C A:VAL:41:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.327494256108103}
A:ARG:44:C A:PHE:45:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3319241720158084}
A:GLY:47:C A:THR:48:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3300627804731626}
A:ARG:51:CZ A:ARG:51:NH2 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3355399657067553}
A:ASN:55:CA A:ASN:55:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.509162681754352}
A:PHE:58:CG A:PHE:58:CD1 {'kind': {'RING', 'SINGLE', 'covalent'}, 'bond_length': 1.3973292382255516}
A:GLN:61:CD A:GLN:61:NE2 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3256794484338952}
A:ASP:66:C A:GLY:67:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3207834796059468}
A:VAL:70:CB A:VAL:70:CG2 {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.544997734626169}
A:VAL:74:CB A:VAL:74:CG2 {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5372358309641367}
A:PHE:78:CG A:PHE:78:CD1 {'kind': {'RING', 'SINGLE', 'covalent'}, 'bond_length': 1.4093633314372853}
A:ALA:82:N A:ALA:82:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.4588163009782962}
A:ARG:87:CA A:ARG:87:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5217493223261191}
A:ALA:90:C A:LEU:91:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3322357148793165}
A:LYS:95:CA A:LYS:95:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5207899263211861}
A:ASP:98:CG A:ASP:98:OD2 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2490592459927585}
A:ASP:103:C A:ASP:103:O {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2269449865417776}
A:VAL:107:C A:VAL:107:O {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2373241289169141}
A:HIS:111:CG A:HIS:111:ND1 {'kind': {'RING', 'SINGLE', 'covalent'}, 'bond_length': 1.3857918314090316}
A:CYS:115:CB A:CYS:115:SG {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.8180277775655709}
A:ASN:120:CA A:ASN:120:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5075738787867117}
A:ILE:124:N A:ILE:124:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.449215994943473}
A:VAL:127:CB A:VAL:127:CG2 {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5337962706956894}
A:LEU:131:C A:LEU:131:O {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2503483514605058}
A:GLN:134:CD A:GLN:134:NE2 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.315718434924434}
A:PHE:138:CA A:PHE:138:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5332312284844685}
A:GLU:140:CD A:GLU:140:OE1 {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.2552963793463292}
A:LYS:143:CG A:LYS:143:CD {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.532886819044384}
A:GLY:148:N A:GLY:148:CA {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.4515767289399488}
A:VAL:151:CA A:VAL:151:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5020672421699368}
A:GLY:155:C A:TRP:156:N {'kind': {'DOUBLE', 'covalent'}, 'bond_length': 1.3299751877384798}
A:ILE:159:CA A:ILE:159:C {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5274563823559741}
A:HIS:163:CB A:HIS:163:CG {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.497479883003441}
A:ILE:166:CB A:ILE:166:CG2 {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5218219343931156}
A:ALA:171:CA A:ALA:171:CB {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.5277568523819485}
A:LYS:174:CD A:LYS:174:CE {'kind': {'SINGLE', 'covalent'}, 'bond_length': 1.513726857791723}

We can also include some of the distance-based functions we used in the residue graph construction. Here we add the residue-level delaunay triangulation to the graph.

[8]:
from functools import partial
from graphein.protein.edges.distance import add_delaunay_triangulation
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges, add_bond_order, partial(add_delaunay_triangulation)]}

config = ProteinGraphConfig(**params_to_change)
g = construct_graph(config=config, pdb_code="3eiy")

p = plotly_protein_structure_graph(
    g,
    colour_edges_by="kind",
    colour_nodes_by="element_symbol",
    label_node_ids=False,
    node_size_min=5,
    node_alpha=0.85,
    node_size_multiplier=1,
    plot_title="Atom-level graph (PDB: 3eiy) with Delaunay Triangulation. Nodes coloured by their Element"
    )
p.show()
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1330 total nodes
DEBUG:graphein.protein.edges.distance:Detected 8582 simplices in the Delaunay Triangulation.

It’s also possible to apply the triangulation to certain atomtypes, such as \(C\alpha\) only partialling the add_delaunay_triangulation and specifying a list of allowable_nodes.

[9]:
from functools import partial
from graphein.protein.edges.distance import add_delaunay_triangulation
params_to_change = {
    "granularity": "atom",
    "edge_construction_functions": [add_atomic_edges,
                                    add_bond_order,
                                    partial(add_delaunay_triangulation, allowable_nodes=["CA"])
                                    ]
    }

config = ProteinGraphConfig(**params_to_change)
g = construct_graph(config=config, pdb_code="3eiy")

p = plotly_protein_structure_graph(
    g,
    colour_edges_by="kind",
    colour_nodes_by="element_symbol",
    label_node_ids=False,
    node_size_min=5,
    node_alpha=0.85,
    node_size_multiplier=1,
    plot_title="Atom-level graph (PDB: 3eiy) with CA triangulation. Nodes coloured by their Element"
    )
p.show()
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1330 total nodes
DEBUG:graphein.protein.edges.distance:Detected 954 simplices in the Delaunay Triangulation.