graphein.ppi#
Config#
- class graphein.ppi.config.BioGridConfig(*, searchNames: bool = True, max: int = 10000, interSpeciesExcluded: bool = True, selfInteractionsExcluded: bool = False, evidenceList: str = '', includeEvidence: bool = False, searchIds: bool = True, searchSynonyms: bool = True, searchBiogridIds: bool = True, additionalIdentifierTypes: str = '', excludeGenes: bool = False, includeInteractors: bool = True, includeInteractorInteractions: bool = False, pubmedList: str = '', excludePubmeds: bool = False, htpThreshold: int = 20, throughputTag: str = 'any')[source]#
Config for specifying parameters for API calls to BIOGRID. A full description of the parameters can be found at : https://wiki.thebiogrid.org/doku.php/biogridrest
- Parameters
searchNames (bool, optional) – If ‘true’, the interactor OFFICIAL_SYMBOL will be examined for a match with the geneList.
max (int, optional) – Number of results to fetch, defaults to 10,000
interSpeciesExcluded (bool, optional) – If ‘true’, interactions with interactors from different species will be excluded, defaults to True
selfInteractionsExcluded (bool, optional.) – If ‘true’, interactions with one interactor will be excluded, defaults to False
evidenceList (str, optional) – Any interaction evidence with its Experimental System in the list will be excluded from the results unless includeEvidence is set to true., defaults to “” (empty string)
includeEvidence (bool, optional) – If set to true, any interaction evidence with its Experimental System in the evidenceList will be included in the result, defaults to False
searchIDs (bool, optional) – If ‘true’, the interactor ENTREZ_GENE, ORDERED LOCUS and SYSTEMATIC_NAME (orf) will be examined for a match with the geneList. Defaults to True
searchNames – # If ‘true’, the interactor OFFICIAL_SYMBOL will be examined for a match with the geneList. Defaults to True.
searchSynonyms (bool, optional) – If ‘true’, the interactor SYNONYMS will be examined for a match with the geneList. Defaults to True.
searchBiogridIds – If ‘true’, the entries in ‘GENELIST’ will be compared to BIOGRID internal IDS which are provided in all Tab2 formatted files. Defaults to True
additionalIdentifierTypes (str, optional) – Identifier types on this list are examined for a match with the geneList. Defaults to “”
excludeGenes (bool, optional) – If ‘true’, interactions containing genes in the geneList will be excluded from the results. Defaults to False
includeInteractors (bool, optional) – If ‘true’, in addition to interactions between genes on the geneList, interactions will also be fetched which have only one interactor on the geneList. Defaults to True
includeInteractorInteractions (bool, optional) – # If ‘true’ interactions between the geneList’s first order interactors will be included. Defaults to False
pubmedList (str, optional) – Interactions will be fetched whose Pubmed Id is/ is not in this list, depending on the value of excludePubmeds. Defaults to “”
excludePubmeds (bool, optional) – If ‘false’, interactions with Pubmed ID in pubmedList will be included in the results; if ‘true’ they will be excluded. Defaults to False
htpThreshold (int, optional) – Interactions whose Pubmed ID has more than this number of interactions will be excluded from the results. Ignored if excludePubmeds is ‘false’. Defaults to 20.
throughputTag (str, optional) – If set to ‘low or ‘high’, only interactions with ‘Low throughput’ or ‘High throughput’ in the ‘throughput’ field will be returned. Defaults to “any”
- class graphein.ppi.config.PPIGraphConfig(*, paginate: bool = True, ncbi_taxon_id: int = 9606, kwargs: Dict[str, Union[str, int, float]] = {'BIOGRID_throughputTag': 'high', 'STRING_escore': 0.2}, string_config: graphein.ppi.config.STRINGConfig = None, biogrid_config: graphein.ppi.config.BioGridConfig = None)[source]#
Config for specifying parameters for PPI Graph Construction
- Parameters
paginate (bool) – Controls whether or not to paginate API calls. Useful for large queries. Defaults to True
ncbi_taxon_id (int) – Defaults to 9606 (human)
string_config (graphein.ppi.config.STRINGConfig) – Config Object holding parameters for STRINGdb API calls. Defaults to None
biogrid_config (graphein.ppi.config.BioGridConfig, optional) – Config Object holding parameters for BioGrid API calls. Defaults to None
- class graphein.ppi.config.STRINGConfig(*, species: int = 9606, required_score: int = 50, network_type: str = 'functional', add_nodes: int = 0, show_query_node_labels: bool = 0)[source]#
Config for specifying parameters for API calls to STRINGdb. Full documentation can be found: https://string-db.org/help/api/
- Parameters
species (int, optional) – NCBI taxon identifiers, defaults to 9606 (human)
required_score (int, optional) – Threshold of significance to include a interaction, a number between 0 and 1000 (default depends on the network)
network_type (str, optional) – Network type: “functional” (default), “physical”
add_nodes (int, optional) – Adds a number of proteins to the network based on their confidence score, e.g., extends the interaction neighborhood of selected proteins to desired value, defaults to 50
show_query_node_labels (bool, optional) – When available use submitted names in the preferredName column when (0 or 1) (default:0)
Graph Construction#
Graphs#
Functions for constructing a PPI PPIGraphConfig from STRINGdb and BIOGRID.
- graphein.ppi.graphs.compute_ppi_graph(protein_list: List[str], edge_construction_funcs: List[Callable], graph_annotation_funcs: Optional[List[Callable]] = None, node_annotation_funcs: Optional[List[Callable]] = None, edge_annotation_funcs: Optional[List[Callable]] = None, config: Optional[graphein.ppi.config.PPIGraphConfig] = None) networkx.classes.graph.Graph [source]#
Computes a PPI Graph from a list of protein IDs. This is the core function for PPI graph construction.
- Parameters
protein_list (List[str]) – List of protein identifiers
edge_construction_funcs (List[Callable], optional) – List of functions to construct edges with
graph_annotation_funcs (List[Callable], optional) – List of functions to annotate graph metadata
node_annotation_funcs (List[Callable], optional) – List of functions to annotate node metadata
edge_annotation_funcs (List[Callable], optional) – List of function to annotate edge metadata
config (PPIGraphConfig, optional) – Config object specifying additional parameters for STRING and BIOGRID API calls
- Returns
nx.Graph
of PPI network- Return type
nx.Graph
- graphein.ppi.graphs.parse_kwargs_from_config(config: graphein.ppi.config.PPIGraphConfig) graphein.ppi.config.PPIGraphConfig [source]#
If configs for STRING and BIOGRID are provided in the Global ~graphein.ppi.config.PPIGraphConfig, we update the kwargs
- Parameters
config (PPIGraphConfig) – PPI graph configuration object.
- Returns
config with updated config.kwargs
- Return type
Edges#
Functions for adding edges to a PPI Graph from parsed STRING & BIOGRID API call outputs.
- graphein.ppi.edges.add_biogrid_edges(G: networkx.classes.graph.Graph, **kwargs) networkx.classes.graph.Graph [source]#
Adds edges from the BIOGRID database (https://thebiogrid.org/) to PPI Graph.
- Parameters
G (nx.Graph) – Graph to edges to (populated with
protein_id
nodes).kwargs – Additional parameters to pass to BIOGRID API calls.
- Returns
nx.Graph PPIGraph with BIOGRID interactions added as edges.
- Return type
nx.Graph
- graphein.ppi.edges.add_interacting_proteins(G: networkx.classes.graph.Graph, df: pandas.core.frame.DataFrame, kind: str) networkx.classes.graph.Graph [source]#
Generic function for adding interaction edges to PPI Graph. You can use this function to additional interactions using a dataframe with columns
"p1"
and"p2"
.- Parameters
G (nx.Graph) – PPI Graph to populate with edges.
df (pd.DataFrame) – Dataframe containing edgelist.
kind (str) – name of interaction type.
- Returns
PPI Graph with pre-computed edges added.
- Return type
nx.Graph
- graphein.ppi.edges.add_string_edges(G: networkx.classes.graph.Graph, **kwargs) networkx.classes.graph.Graph [source]#
Adds edges from STRING PPI database (https://string-db.org/) to a PPI Graph.
- Parameters
G (nx.Graph) – Graph to edges to (populated with
protein_id
nodes).kwargs – Additional parameters to pass to STRING API calls.
- Returns
PPI Graph with STRING interactions added as edges.
- Return type
nx.Graph
Graph Features#
Functions for adding metadata to PPI Graphs from STRING and BIOGRID.
- graphein.ppi.graph_metadata.add_biogrid_metadata(G: networkx.classes.graph.Graph, kwargs: Dict[str, Union[str, int]]) networkx.classes.graph.Graph [source]#
Adds interaction dataframe from BIOGRID to graph.
- graphein.ppi.graph_metadata.add_string_biogrid_metadata(G: networkx.classes.graph.Graph, kwargs: Dict[str, Union[str, int]]) networkx.classes.graph.Graph [source]#
Adds interaction dataframe from STRING and BIOGRID to graph.
Node Features#
Functions for adding nodes features to a PPI Graph
Visualisation#
Contains utilities for plotting PPI NetworkX graphs.
- graphein.ppi.visualisation.get_edge_trace(g: networkx.classes.graph.Graph, edge_colours: Optional[List[str]] = None) List[plotly.graph_objs._scatter.Scatter] [source]#
Gets edge traces from PPI graph. Returns a list of traces enabling edge colours to be set individually.
- Parameters
g (nx.Graph) – _description_
- Returns
_description_
- Return type
List[go.Scatter]
- graphein.ppi.visualisation.get_node_trace(g: networkx.classes.graph.Graph, node_size_multiplier: float, node_colourscale: str = 'Viridis') plotly.graph_objs._scatter.Scatter [source]#
Produces the node trace for the plotly plot.
- Parameters
- Returns
Node trace for plotly plot
- Return type
go.Scatter
- graphein.ppi.visualisation.plot_ppi_graph(g: networkx.classes.graph.Graph, colour_edges_by: str = 'kind', with_labels: bool = True, **kwargs)[source]#
Plots a Protein-Protein Interaction Graph. Colours edges by kind.
- Parameters
g (nx.Graph) – NetworkX graph of PPI network.
colour_edges_by – Colour edges by this attribute. Currently, only supports ‘kind’, which colours edges by the source database, by default “kind”
with_labels (bool, optional) – Whether to show labels on nodes. Defaults to True.
- graphein.ppi.visualisation.plotly_ppi_graph(g: networkx.classes.graph.Graph, layout: <module 'networkx.drawing.layout' from '/Users/arianjamasb/opt/anaconda3/envs/graphein-wip/lib/python3.8/site-packages/networkx/drawing/layout.py'> = <function circular_layout>, title: typing.Optional[str] = None, show_labels: bool = False, node_size_multiplier: float = 5.0, node_colourscale: str = 'Viridis', edge_colours: typing.Optional[typing.List[str]] = None, edge_opacity: float = 0.5, height: int = 500, width: int = 500)[source]#
Plots a PPI graph.
- Parameters
g (nx.Graph) – PPI graph
layout (nx.layout) – Layout algorithm to use. Default is circular_layout.
title (str, optional) – Title of the graph. Default is None.
show_labels (bool) – If True, shows labels on nodes. Default is False.
node_size_multiplier (float) – Multiplier for node size. Default is 5.0.
node_colourscale (str) – Colour scale to use for node colours. Default is “Viridis”. Options: ‘Greys’ | ‘YlGnBu’ | ‘Greens’ | ‘YlOrRd’ | ‘Bluered’ | ‘RdBu’ | ‘Reds’ | ‘Blues’ | ‘Picnic’ | ‘Rainbow’ | ‘Portland’ | ‘Jet’ | ‘Hot’ | ‘Blackbody’ | ‘Earth’ | ‘Electric’ | ‘Viridis’ |
edge_colours (List[str], optional) – List of colours (hexcode) to use for edges. Default is None (px.colours.qualitative.T10).
edge_opacity (float) – Opacity of edges. Default is 0.5.
height (int) – Height of the plot. Default is 500.
width (int) – Width of the plot. Default is 500.
- Returns
Plotly figure of PPI Network
- Return type
go.Figure
Database Parsers#
BioGrid#
Functions for making and parsing API calls to BIOGRID.
- graphein.ppi.parse_biogrid.BIOGRID_df(protein_list: List[str], ncbi_taxon_id: Union[int, str, List[int], List[str]], **kwargs) pandas.core.frame.DataFrame [source]#
Generates standardised dataframe with BIOGRID protein-protein interactions, filtered according to user’s input.
- Protein_list
List of proteins (official symbol) that will be included in the PPI graph
- Ncbi_taxon_id
NCBI taxonomy identifiers for the organism. 9606 corresponds to Homo Sapiens
- Parameters
kwargs (Union[int, str, List[int], List[str]]) – Additional parameters to pass to BIOGRID API calls
- Returns
Standardised dataframe with BIOGRID interactions
- Return type
pd.DataFrame
- graphein.ppi.parse_biogrid.filter_BIOGRID(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame [source]#
Filters results of the BIOGRID API call according to user kwargs.
- Parameters
df (pd.DataFrame) – Source specific Pandas dataframe (BIOGRID) with results of the API call
kwargs (Dict[str, Union[str, int, List[str], List[int]]]) – User thresholds used to filter the results. The parameter names are of the form BIOGRID_<param>, where <param> is the name of the parameter. All the parameters are numerical values.
- Returns
Source specific Pandas dataframe with filtered results
- Return type
pd.DataFrame
- graphein.ppi.parse_biogrid.params_BIOGRID(params: Dict[str, Union[str, int, List[str], List[int]]], **kwargs) Dict[str, Union[str, int]] [source]#
Updates default parameters with user parameters for the method “interactions” of the BIOGRID API REST.
See also https://wiki.thebiogrid.org/doku.php/biogridrest :param params: Dictionary of default parameters :type params: Dict[str, Union[str, int, List[str], List[int]]] :param kwargs: User parameters for the method “network” of the BIOGRID API REST. The key must start with “BIOGRID” :type kwargs: Dict[str, Union[str, int, List[str], List[int]]] :return: Dictionary of parameters :rtype: Dict[str, Union[str, int]]
- graphein.ppi.parse_biogrid.parse_BIOGRID(protein_list: List[str], ncbi_taxon_id: Union[int, str, List[int], List[str]], paginate: bool = True, **kwargs) pandas.core.frame.DataFrame [source]#
Makes BIOGRID API call and returns a source specific Pandas dataframe.
See also [1] BIOGRID: https://wiki.thebiogrid.org/doku.php/biogridrest :param protein_list: Proteins to include in the graph :type protein_list: List[str] :param ncbi_taxon_id: NCBI taxonomy identifiers for the organism. Default is 9606 (Homo Sapiens) :type ncbi_taxon_id: Union[int, str, List[int], List[str]] :param paginate: boolean indicating whether to paginate the calls (for BIOGRID, the maximum number of rows per
call is 10000). Defaults to True
- Parameters
kwargs (Dict[str, Union[str, int, List[str], List[int]]]) – Parameters of the “interactions” method of the BIOGRID API REST, used to select the results. The parameter names are of the form BIOGRID_<param>, where <param> is the name of the parameter. Information about these parameters can be found at [1].
- Returns
Source specific Pandas dataframe.
- Return type
pd.DataFrame
- graphein.ppi.parse_biogrid.standardise_BIOGRID(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]#
Standardises BIOGRID dataframe, e.g. puts everything into a common format.
- Parameters
df (pd.DataFrame) – Source specific Pandas dataframe
- Returns
Standardised dataframe
- Rtpe
pd.DataFrame
STRINGDB#
Functions for making and parsing API calls to STRINGdb.
- graphein.ppi.parse_stringdb.STRING_df(protein_list: List[str], ncbi_taxon_id: Union[int, str, List[int], List[str]], **kwargs) pandas.core.frame.DataFrame [source]#
Generates standardised dataframe with STRING protein-protein interactions, filtered according to user’s input.
- Parameters
- Returns
Standardised dataframe with STRING interactions
- Return type
pd.DataFrame
- graphein.ppi.parse_stringdb.filter_STRING(df: pandas.core.frame.DataFrame, **kwargs) pandas.core.frame.DataFrame [source]#
Filters results of the STRING API call according to user kwargs, keeping rows where the input parameters are greater or equal than the input thresholds.
- Parameters
df (pd.DataFrame) – Source specific Pandas dataframe (STRING) with results of the API call
kwargs (Dict[str, Union[str, int, List[str], List[int]]]) – User thresholds used to filter the results. The parameter names are of the form STRING_<param>, where <param> is the name of the parameter. All the parameters are numerical values.
- Returns
Source specific Pandas dataframe with filtered results
- Return type
pd.DataFrame
- graphein.ppi.parse_stringdb.params_STRING(params: Dict[str, Union[str, int, List[str], List[int]]], **kwargs) Dict[str, Union[str, int]] [source]#
Updates default parameters with user parameters for the method “network” of the STRING API REST. See also https://string-db.org/help/api/
- Parameters
- Returns
Dictionary of parameters
- Return type
- graphein.ppi.parse_stringdb.parse_STRING(protein_list: List[str], ncbi_taxon_id: Union[int, str, List[int], List[str]], **kwargs) pandas.core.frame.DataFrame [source]#
Makes STRING API call and returns a source specific Pandas dataframe. See also [1] STRING: https://string-db.org/help/api/
- Parameters
protein_list (List[str]) – Proteins to include in the graph
ncbi_taxon_id (int) – NCBI taxonomy identifiers for the organism. Default is 9606 (Homo Sapiens)
kwargs (Dict[str, Union[str, int, List[str], List[int]]]) – Parameters of the “network” method of the STRING API REST, used to select the results. The parameter names are of the form STRING_<param>, where <param> is the name of the parameter. Information about these parameters can be found at [1].
- Returns
Source specific Pandas dataframe.
- Return type
pd.DataFrame
- graphein.ppi.parse_stringdb.standardise_STRING(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame [source]#
Standardises STRING dataframe, e.g. puts everything into a common format.
- Parameters
df (pd.DataFrame) – Source specific Pandas dataframe
- Returns
Standardised dataframe
- Return type
pd.DataFrame