graphein.ppi#

Config#

class graphein.ppi.config.BioGridConfig(*, searchNames: bool = True, max: int = 10000, interSpeciesExcluded: bool = True, selfInteractionsExcluded: bool = False, evidenceList: str = '', includeEvidence: bool = False, searchIds: bool = True, searchSynonyms: bool = True, searchBiogridIds: bool = True, additionalIdentifierTypes: str = '', excludeGenes: bool = False, includeInteractors: bool = True, includeInteractorInteractions: bool = False, pubmedList: str = '', excludePubmeds: bool = False, htpThreshold: int = 20, throughputTag: str = 'any')[source]#

Config for specifying parameters for API calls to BIOGRID. A full description of the parameters can be found at : https://wiki.thebiogrid.org/doku.php/biogridrest

Parameters
  • searchNames (bool, optional) – If ‘true’, the interactor OFFICIAL_SYMBOL will be examined for a match with the geneList.

  • max (int, optional) – Number of results to fetch, defaults to 10,000

  • interSpeciesExcluded (bool, optional) – If ‘true’, interactions with interactors from different species will be excluded, defaults to True

  • selfInteractionsExcluded (bool, optional.) – If ‘true’, interactions with one interactor will be excluded, defaults to False

  • evidenceList (str, optional) – Any interaction evidence with its Experimental System in the list will be excluded from the results unless includeEvidence is set to true., defaults to “” (empty string)

  • includeEvidence (bool, optional) – If set to true, any interaction evidence with its Experimental System in the evidenceList will be included in the result, defaults to False

  • searchIDs (bool, optional) – If ‘true’, the interactor ENTREZ_GENE, ORDERED LOCUS and SYSTEMATIC_NAME (orf) will be examined for a match with the geneList. Defaults to True

  • searchNames – # If ‘true’, the interactor OFFICIAL_SYMBOL will be examined for a match with the geneList. Defaults to True.

  • searchSynonyms (bool, optional) – If ‘true’, the interactor SYNONYMS will be examined for a match with the geneList. Defaults to True.

  • searchBiogridIds – If ‘true’, the entries in ‘GENELIST’ will be compared to BIOGRID internal IDS which are provided in all Tab2 formatted files. Defaults to True

  • additionalIdentifierTypes (str, optional) – Identifier types on this list are examined for a match with the geneList. Defaults to “”

  • excludeGenes (bool, optional) – If ‘true’, interactions containing genes in the geneList will be excluded from the results. Defaults to False

  • includeInteractors (bool, optional) – If ‘true’, in addition to interactions between genes on the geneList, interactions will also be fetched which have only one interactor on the geneList. Defaults to True

  • includeInteractorInteractions (bool, optional) – # If ‘true’ interactions between the geneList’s first order interactors will be included. Defaults to False

  • pubmedList (str, optional) – Interactions will be fetched whose Pubmed Id is/ is not in this list, depending on the value of excludePubmeds. Defaults to “”

  • excludePubmeds (bool, optional) – If ‘false’, interactions with Pubmed ID in pubmedList will be included in the results; if ‘true’ they will be excluded. Defaults to False

  • htpThreshold (int, optional) – Interactions whose Pubmed ID has more than this number of interactions will be excluded from the results. Ignored if excludePubmeds is ‘false’. Defaults to 20.

  • throughputTag (str, optional) – If set to ‘low or ‘high’, only interactions with ‘Low throughput’ or ‘High throughput’ in the ‘throughput’ field will be returned. Defaults to “any”

class graphein.ppi.config.PPIGraphConfig(*, paginate: bool = True, ncbi_taxon_id: int = 9606, kwargs: Dict[str, Union[str, int, float]] = {'BIOGRID_throughputTag': 'high', 'STRING_escore': 0.2}, string_config: graphein.ppi.config.STRINGConfig = None, biogrid_config: graphein.ppi.config.BioGridConfig = None)[source]#

Config for specifying parameters for PPI Graph Construction

Parameters
  • paginate (bool) – Controls whether or not to paginate API calls. Useful for large queries. Defaults to True

  • ncbi_taxon_id (int) – Defaults to 9606 (human)

  • kwargs (Dict[str, Union[str, int, float]], optional) –

  • string_config (graphein.ppi.config.STRINGConfig) – Config Object holding parameters for STRINGdb API calls. Defaults to None

  • biogrid_config (graphein.ppi.config.BioGridConfig, optional) – Config Object holding parameters for BioGrid API calls. Defaults to None

class graphein.ppi.config.STRINGConfig(*, species: int = 9606, required_score: int = 50, network_type: str = 'functional', add_nodes: int = 0, show_query_node_labels: bool = 0)[source]#

Config for specifying parameters for API calls to STRINGdb. Full documentation can be found: https://string-db.org/help/api/

Parameters
  • species (int, optional) – NCBI taxon identifiers, defaults to 9606 (human)

  • required_score (int, optional) – Threshold of significance to include a interaction, a number between 0 and 1000 (default depends on the network)

  • network_type (str, optional) – Network type: “functional” (default), “physical”

  • add_nodes (int, optional) – Adds a number of proteins to the network based on their confidence score, e.g., extends the interaction neighborhood of selected proteins to desired value, defaults to 50

  • show_query_node_labels (bool, optional) – When available use submitted names in the preferredName column when (0 or 1) (default:0)

Graph Construction#

Graphs#

Functions for constructing a PPI PPIGraphConfig from STRINGdb and BIOGRID.

graphein.ppi.graphs.compute_ppi_graph(protein_list: List[str], edge_construction_funcs: List[Callable], graph_annotation_funcs: Optional[List[Callable]] = None, node_annotation_funcs: Optional[List[Callable]] = None, edge_annotation_funcs: Optional[List[Callable]] = None, config: Optional[graphein.ppi.config.PPIGraphConfig] = None) networkx.classes.graph.Graph[source]#

Computes a PPI Graph from a list of protein IDs. This is the core function for PPI graph construction.

Parameters
  • protein_list (List[str]) – List of protein identifiers

  • edge_construction_funcs (List[Callable], optional) – List of functions to construct edges with

  • graph_annotation_funcs (List[Callable], optional) – List of functions to annotate graph metadata

  • node_annotation_funcs (List[Callable], optional) – List of functions to annotate node metadata

  • edge_annotation_funcs (List[Callable], optional) – List of function to annotate edge metadata

  • config (PPIGraphConfig, optional) – Config object specifying additional parameters for STRING and BIOGRID API calls

Returns

nx.Graph of PPI network

Return type

nx.Graph

graphein.ppi.graphs.parse_kwargs_from_config(config: graphein.ppi.config.PPIGraphConfig) graphein.ppi.config.PPIGraphConfig[source]#

If configs for STRING and BIOGRID are provided in the Global ~graphein.ppi.config.PPIGraphConfig, we update the kwargs

Parameters

config (PPIGraphConfig) – PPI graph configuration object.

Returns

config with updated config.kwargs

Return type

PPIGraphConfig

Edges#

Functions for adding edges to a PPI Graph from parsed STRING & BIOGRID API call outputs.

graphein.ppi.edges.add_biogrid_edges(G: networkx.classes.graph.Graph, **kwargs) networkx.classes.graph.Graph[source]#

Adds edges from the BIOGRID database (https://thebiogrid.org/) to PPI Graph.

Parameters
  • G (nx.Graph) – Graph to edges to (populated with protein_id nodes).

  • kwargs – Additional parameters to pass to BIOGRID API calls.

Returns

nx.Graph PPIGraph with BIOGRID interactions added as edges.

Return type

nx.Graph

graphein.ppi.edges.add_interacting_proteins(G: networkx.classes.graph.Graph, df: pandas.core.frame.DataFrame, kind: str) networkx.classes.graph.Graph[source]#

Generic function for adding interaction edges to PPI Graph. You can use this function to additional interactions using a dataframe with columns "p1" and "p2".

Parameters
  • G (nx.Graph) – PPI Graph to populate with edges.

  • df (pd.DataFrame) – Dataframe containing edgelist.

  • kind (str) – name of interaction type.

Returns

PPI Graph with pre-computed edges added.

Return type

nx.Graph

graphein.ppi.edges.add_string_edges(G: networkx.classes.graph.Graph, **kwargs) networkx.classes.graph.Graph[source]#

Adds edges from STRING PPI database (https://string-db.org/) to a PPI Graph.

Parameters
  • G (nx.Graph) – Graph to edges to (populated with protein_id nodes).

  • kwargs – Additional parameters to pass to STRING API calls.

Returns

PPI Graph with STRING interactions added as edges.

Return type

nx.Graph

Graph Features#

Functions for adding metadata to PPI Graphs from STRING and BIOGRID.

graphein.ppi.graph_metadata.add_biogrid_metadata(G: networkx.classes.graph.Graph, kwargs: Dict[str, Union[str, int]]) networkx.classes.graph.Graph[source]#

Adds interaction dataframe from BIOGRID to graph.

Parameters
  • G (nx.Graph) – PPI Graph to add metadata to

  • kwargs (Dict[str, Union[str, int]]) – Additional parameters for BIOGRID API call

Returns

PPIGraph with added BIOGRID interaction_df as metadata

Return type

nx.Graph

graphein.ppi.graph_metadata.add_string_biogrid_metadata(G: networkx.classes.graph.Graph, kwargs: Dict[str, Union[str, int]]) networkx.classes.graph.Graph[source]#

Adds interaction dataframe from STRING and BIOGRID to graph.

Parameters
  • G (nx.Graph) – PPIGraph to add metadata to

  • kwargs (Dict[str, Union[str, int]]) – Additional parameters for STRING and BIOGRID API calls

Returns

PPIGraph with added STRING and BIOGRID interaction_df as metadata

Return type

nx.Graph

graphein.ppi.graph_metadata.add_string_metadata(G: networkx.classes.graph.Graph, kwargs: Dict[str, Union[str, int]]) networkx.classes.graph.Graph[source]#

Adds interaction dataframe from STRING to graph.

Parameters
  • G (nx.Graph) – PPI Graph to add metadata to

  • kwargs (Dict[str, Union[str, int]]) – Additional parameters for STRING API call

Returns

PPIGraph with added STRING interaction_df as metadata

Return type

nx.Graph

Node Features#

Functions for adding nodes features to a PPI Graph

graphein.ppi.features.node_features.add_sequence_to_nodes(n: str, d: Dict[str, Any])[source]#

Maps UniProt ACC to UniProt ID. Retrieves sequence from UniProt and adds it to the node as a feature

Parameters
  • n (str) – Graph node.

  • d (Dict[str, Any]) – Graph attribute dictionary.

Visualisation#

Contains utilities for plotting PPI NetworkX graphs.

graphein.ppi.visualisation.get_edge_trace(g: networkx.classes.graph.Graph, edge_colours: Optional[List[str]] = None) List[plotly.graph_objs._scatter.Scatter][source]#

Gets edge traces from PPI graph. Returns a list of traces enabling edge colours to be set individually.

Parameters

g (nx.Graph) – _description_

Returns

_description_

Return type

List[go.Scatter]

graphein.ppi.visualisation.get_node_trace(g: networkx.classes.graph.Graph, node_size_multiplier: float, node_colourscale: str = 'Viridis') plotly.graph_objs._scatter.Scatter[source]#

Produces the node trace for the plotly plot.

Parameters
  • g (nx.Graph) – PPI graph with [‘pos’] added to the nodes (eg via nx.layout function)

  • node_size_multiplier (float) – Multiplier for node size. Default is 5.0.

  • node_colourscale (str, optional) – Colourscale to use for the nodes, defaults to “Viridis”

Returns

Node trace for plotly plot

Return type

go.Scatter

graphein.ppi.visualisation.plot_ppi_graph(g: networkx.classes.graph.Graph, colour_edges_by: str = 'kind', with_labels: bool = True, **kwargs)[source]#

Plots a Protein-Protein Interaction Graph. Colours edges by kind.

Parameters
  • g (nx.Graph) – NetworkX graph of PPI network.

  • colour_edges_by – Colour edges by this attribute. Currently, only supports ‘kind’, which colours edges by the source database, by default “kind”

  • with_labels (bool, optional) – Whether to show labels on nodes. Defaults to True.

graphein.ppi.visualisation.plotly_ppi_graph(g: networkx.classes.graph.Graph, layout: <module 'networkx.drawing.layout' from '/Users/arianjamasb/opt/anaconda3/envs/graphein-wip/lib/python3.8/site-packages/networkx/drawing/layout.py'> = <function circular_layout>, title: typing.Optional[str] = None, show_labels: bool = False, node_size_multiplier: float = 5.0, node_colourscale: str = 'Viridis', edge_colours: typing.Optional[typing.List[str]] = None, edge_opacity: float = 0.5, height: int = 500, width: int = 500)[source]#

Plots a PPI graph.

Parameters
  • g (nx.Graph) – PPI graph

  • layout (nx.layout) – Layout algorithm to use. Default is circular_layout.

  • title (str, optional) – Title of the graph. Default is None.

  • show_labels (bool) – If True, shows labels on nodes. Default is False.

  • node_size_multiplier (float) – Multiplier for node size. Default is 5.0.

  • node_colourscale (str) – Colour scale to use for node colours. Default is “Viridis”. Options: ‘Greys’ | ‘YlGnBu’ | ‘Greens’ | ‘YlOrRd’ | ‘Bluered’ | ‘RdBu’ | ‘Reds’ | ‘Blues’ | ‘Picnic’ | ‘Rainbow’ | ‘Portland’ | ‘Jet’ | ‘Hot’ | ‘Blackbody’ | ‘Earth’ | ‘Electric’ | ‘Viridis’ |

  • edge_colours (List[str], optional) – List of colours (hexcode) to use for edges. Default is None (px.colours.qualitative.T10).

  • edge_opacity (float) – Opacity of edges. Default is 0.5.

  • height (int) – Height of the plot. Default is 500.

  • width (int) – Width of the plot. Default is 500.

Returns

Plotly figure of PPI Network

Return type

go.Figure

Database Parsers#

BioGrid#

Functions for making and parsing API calls to BIOGRID.

graphein.ppi.parse_biogrid.BIOGRID_df(protein_list: List[str], ncbi_taxon_id: Union[int, str, List[int], List[str]], **kwargs) pandas.core.frame.DataFrame[source]#

Generates standardised dataframe with BIOGRID protein-protein interactions, filtered according to user’s input.

Protein_list

List of proteins (official symbol) that will be included in the PPI graph

Ncbi_taxon_id

NCBI taxonomy identifiers for the organism. 9606 corresponds to Homo Sapiens

Parameters

kwargs (Union[int, str, List[int], List[str]]) – Additional parameters to pass to BIOGRID API calls

Returns

Standardised dataframe with BIOGRID interactions

Return type

pd.DataFrame

graphein.ppi.parse_biogrid.filter_BIOGRID(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Filters results of the BIOGRID API call according to user kwargs.

Parameters
  • df (pd.DataFrame) – Source specific Pandas dataframe (BIOGRID) with results of the API call

  • kwargs (Dict[str, Union[str, int, List[str], List[int]]]) – User thresholds used to filter the results. The parameter names are of the form BIOGRID_<param>, where <param> is the name of the parameter. All the parameters are numerical values.

Returns

Source specific Pandas dataframe with filtered results

Return type

pd.DataFrame

graphein.ppi.parse_biogrid.params_BIOGRID(params: Dict[str, Union[str, int, List[str], List[int]]], **kwargs) Dict[str, Union[str, int]][source]#

Updates default parameters with user parameters for the method “interactions” of the BIOGRID API REST.

See also https://wiki.thebiogrid.org/doku.php/biogridrest :param params: Dictionary of default parameters :type params: Dict[str, Union[str, int, List[str], List[int]]] :param kwargs: User parameters for the method “network” of the BIOGRID API REST. The key must start with “BIOGRID” :type kwargs: Dict[str, Union[str, int, List[str], List[int]]] :return: Dictionary of parameters :rtype: Dict[str, Union[str, int]]

graphein.ppi.parse_biogrid.parse_BIOGRID(protein_list: List[str], ncbi_taxon_id: Union[int, str, List[int], List[str]], paginate: bool = True, **kwargs) pandas.core.frame.DataFrame[source]#

Makes BIOGRID API call and returns a source specific Pandas dataframe.

See also [1] BIOGRID: https://wiki.thebiogrid.org/doku.php/biogridrest :param protein_list: Proteins to include in the graph :type protein_list: List[str] :param ncbi_taxon_id: NCBI taxonomy identifiers for the organism. Default is 9606 (Homo Sapiens) :type ncbi_taxon_id: Union[int, str, List[int], List[str]] :param paginate: boolean indicating whether to paginate the calls (for BIOGRID, the maximum number of rows per

call is 10000). Defaults to True

Parameters

kwargs (Dict[str, Union[str, int, List[str], List[int]]]) – Parameters of the “interactions” method of the BIOGRID API REST, used to select the results. The parameter names are of the form BIOGRID_<param>, where <param> is the name of the parameter. Information about these parameters can be found at [1].

Returns

Source specific Pandas dataframe.

Return type

pd.DataFrame

graphein.ppi.parse_biogrid.standardise_BIOGRID(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Standardises BIOGRID dataframe, e.g. puts everything into a common format.

Parameters

df (pd.DataFrame) – Source specific Pandas dataframe

Returns

Standardised dataframe

Rtpe

pd.DataFrame

STRINGDB#

Functions for making and parsing API calls to STRINGdb.

graphein.ppi.parse_stringdb.STRING_df(protein_list: List[str], ncbi_taxon_id: Union[int, str, List[int], List[str]], **kwargs) pandas.core.frame.DataFrame[source]#

Generates standardised dataframe with STRING protein-protein interactions, filtered according to user’s input.

Parameters
  • protein_list (List[str]) – List of proteins (official symbol) that will be included in the PPI graph

  • ncbi_taxon_id (int) – NCBI taxonomy identifiers for the organism. 9606 corresponds to Homo Sapiens

  • kwargs – Additional parameters to pass to STRING API calls

Returns

Standardised dataframe with STRING interactions

Return type

pd.DataFrame

graphein.ppi.parse_stringdb.filter_STRING(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame[source]#

Filters results of the STRING API call according to user kwargs, keeping rows where the input parameters are greater or equal than the input thresholds.

Parameters
  • df (pd.DataFrame) – Source specific Pandas dataframe (STRING) with results of the API call

  • kwargs (Dict[str, Union[str, int, List[str], List[int]]]) – User thresholds used to filter the results. The parameter names are of the form STRING_<param>, where <param> is the name of the parameter. All the parameters are numerical values.

Returns

Source specific Pandas dataframe with filtered results

Return type

pd.DataFrame

graphein.ppi.parse_stringdb.params_STRING(params: Dict[str, Union[str, int, List[str], List[int]]], **kwargs) Dict[str, Union[str, int]][source]#

Updates default parameters with user parameters for the method “network” of the STRING API REST. See also https://string-db.org/help/api/

Parameters
  • params (Dict[str, Union[str, int, List[str], List[int]]]) – Dictionary of default parameters

  • kwargs (Dict[str, Union[str, int, List[str], List[int]]]) – User parameters for the method “network” of the STRING API REST. The key must start with “STRING”

Returns

Dictionary of parameters

Return type

Dict[str, Union[str, int]]

graphein.ppi.parse_stringdb.parse_STRING(protein_list: List[str], ncbi_taxon_id: Union[int, str, List[int], List[str]], **kwargs) pandas.core.frame.DataFrame[source]#

Makes STRING API call and returns a source specific Pandas dataframe. See also [1] STRING: https://string-db.org/help/api/

Parameters
  • protein_list (List[str]) – Proteins to include in the graph

  • ncbi_taxon_id (int) – NCBI taxonomy identifiers for the organism. Default is 9606 (Homo Sapiens)

  • kwargs (Dict[str, Union[str, int, List[str], List[int]]]) – Parameters of the “network” method of the STRING API REST, used to select the results. The parameter names are of the form STRING_<param>, where <param> is the name of the parameter. Information about these parameters can be found at [1].

Returns

Source specific Pandas dataframe.

Return type

pd.DataFrame

graphein.ppi.parse_stringdb.standardise_STRING(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Standardises STRING dataframe, e.g. puts everything into a common format.

Parameters

df (pd.DataFrame) – Source specific Pandas dataframe

Returns

Standardised dataframe

Return type

pd.DataFrame