Constructing Graphs from AlphaFold Protein Structure Database#
In this quick tutorial we show how to construct graphs from the AlphaFold Protein Structure Database. This is a fantastic resource for the community and we hope to make it more accessible to the Geometric Deep Learning community!
We provide a utility graphein.protein.utils.download_alphafold_structure()
to download PDBs and associated metadata (aligned error scores as an accompanying JSON
file). After downloading the structure, simply use Graphein to compute the graph as you normally would. You can check out the `residue_graphs.ipynb
<https://graphein.ai/notebooks/residue_graphs.html>`__. notebook tutorial to go deeper into protein graph construction utilities in Graphein.
Downloading Structures#
First, we need to download the strutures from AlphaFoldDB. One can download either the structure alone or the aligned scores.
The structures are accessed via their UniProt accession codes.
Here, we’ll be looking at `Q8W3K0
<https://alphafold.ebi.ac.uk/entry/Q8W3K0>`__, a rather beautiful model of a protein with a not-so-catchy name: Probable disease resistance protein At1g58602.
[1]:
from graphein.protein.utils import download_alphafold_structure
# Download the PDB file for an exmaple protein (UniProt: Q8W3K0) with the aligned score
protein_path = download_alphafold_structure("Q8W3K0", out_dir = "/tmp", aligned_score=True)
protein_path
-1 / unknown
INFO:graphein.protein.utils:Downloaded AlphaFold PDB file for: Q8W3K0
-1 / unknown
[1]:
('/private/tmp/Q8W3K0.pdb', '/private/tmp/Q8W3K0.json')
[2]:
# Download the PDB file for an exmaple protein (UniProt: Q8W3K0) without the aligned score
protein_path = download_alphafold_structure("Q8W3K0", out_dir="/tmp", aligned_score=False)
protein_path
INFO:graphein.protein.utils:Downloaded AlphaFold PDB file for: Q8W3K0
-1 / unknown
[2]:
'/private/tmp/Q8W3K0.pdb'
Constructing the Graph#
We show a simplified workflow here. For more information and a fuller exposition of the features, please refer to the Residue Graph Tutorial.
First, we require a config object.
[3]:
from graphein.protein.config import ProteinGraphConfig
from graphein.protein.graphs import construct_graph
# Load the default config
c = ProteinGraphConfig(granularity='CA')
# Construct the graph!
g = construct_graph(pdb_path=protein_path)
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1138 total nodes
DEBUG:graphein.protein.features.nodes.amino_acid:Reading meiler embeddings from: /Users/arianjamasb/github/graphein/graphein/protein/features/nodes/meiler_embeddings.csv
1138
[4]:
from graphein.protein.visualisation import plotly_protein_structure_graph
plotly_protein_structure_graph(g, node_size_multiplier=0.5, colour_nodes_by="residue_name")
Adding Edges#
What if we want to add some more edges to the graph? Well..
[5]:
from graphein.protein.edges.distance import add_aromatic_interactions, add_cation_pi_interactions, add_hydrophobic_interactions, add_ionic_interactions
config = ProteinGraphConfig(edge_construction_functions=[add_aromatic_interactions,
add_cation_pi_interactions,
add_hydrophobic_interactions,
add_ionic_interactions])
g = construct_graph(pdb_path=protein_path, config=config)
plotly_protein_structure_graph(g, colour_edges_by="kind", colour_nodes_by="residue_name", label_node_ids=False, node_size_multiplier=2, node_size_min=5)
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 1138 total nodes
INFO:graphein.protein.edges.distance:Found: 64 aromatic-aromatic interactions
INFO:graphein.protein.edges.distance:Found 2806 hydrophobic interactions.
INFO:graphein.protein.edges.distance:Found 9485 ionic interactions.
You can also check out our Residue Graph Tutorial for more customisation options!! For more details on visualisation, please refer to the interactive visualisation tutorial.