Graphein provides both a programmatic API via the Python library as well as a command-line interface.

Command Line Interface#

Graphein has a simple command line interface to get started and convert PDB files into graphs. It reads a ProteinGraphConfig object from the config.yaml, constructs a graph for the given PDB file(s) and saves them in the output directory in gpickle format.

graphein -c config.yaml -p path/to/pdbs -o path/to/output

YAML Config#

A .yaml config file can be specified to specify any of the config objects. To specify functions, use the !func: tag. To specify one of the config objects defined in graphein use the format !<config_name> (e.g. !ProteinGraphConfig).

# protein_graph_config.yml
    granularity: "CA"
    keep_hets: False
    insertions: False
    verbose: False
        - !func:graphein.protein.features.nodes.amino_acid.meiler_embedding
        - !func:graphein.protein.features.nodes.amino_acid.expasy_protein_scale
        - !func:graphein.protein.edges.distance.add_peptide_bonds
        - !func:graphein.protein.edges.distance.add_distance_threshold
            long_interaction_threshold: 5
            threshold: 10.
    dssp_config: !DSSPConfig
from graphein.utils.config import parse_config
yml_config = parse_config(PATH / "protein_graph_config.yml")

Reading the example .yaml file above with the parse_config function, would be the equivalent of specifying a Python dict of arguments and loading it into the ProteinGraphConfig.

protein_graph_config = {
    "granularity": "CA",
    "keep_hets": False,
    "insertions": False,
    "verbose": False,
    "node_metadata_functions": [meiler_embedding, expasy_protein_scale],
    "edge_construction_functions": [
    "dssp_config": DSSPConfig(),
config = ProteinGraphConfig(**protein_graph_config)