graphein.grn#

Config#

class graphein.grn.config.GRNGraphConfig(*, kwargs: Dict[str, Union[str, int, float]] = {}, trrust_config: graphein.grn.config.TRRUSTConfig = TRRUSTConfig(filtering_functions=None, root_dir=None, kwargs=None), regnetwork_config: graphein.grn.config.RegNetworkConfig = RegNetworkConfig(filtering_functions=None, root_dir=None, kwargs=None))[source]#

Config object for gene regulatory network graph construction.

Parameters
  • kwargs (Dict[str, Union[str, int, float]], optional) – Keyword args for GRN graph construction

  • trrust_config (graphein.grn.config.TRRUSTConfig, optional) – Config object specifying parameters for parsing TRRUST. Defaults to default config object.

  • regnetwork_config (graphein.grn.config.RegNetworkConfig, optional) – Config object specifying parameters for parsing RegNetwork. Defaults to default config object.

class graphein.grn.config.RegNetworkConfig(*, filtering_functions: List[Callable] = None, root_dir: pathlib.Path = None, kwargs: Dict[str, Union[str, int, float]] = None)[source]#

Config object containing parameters for parsing gene regulatory networks from RegNetwork: http://regnetworkweb.org/.

Parameters

filtering_functions (List[Callable], optional) – List of functions to apply to the the RegNetwork dataframe prior to graph construction. Defaults to None

class graphein.grn.config.TRRUSTConfig(*, filtering_functions: List[Callable] = None, root_dir: pathlib.Path = None, kwargs: Dict[str, Union[str, int, float]] = None)[source]#

Config object for parsing gene regulatory networks from TRRUST: https://www.grnpedia.org/trrust/

Parameters
  • filtering_functions (List[Callable], optional) – List of functions to apply to the the TRRUST dataframe prior to graph construction. Defaults to None

  • root_dir (pathlib.Path, optional) – Specifies location of TRRUST dataset (will download to this path if not available). Defaults to None.

  • kwargs (Dict[str, Union[str, int, float]], optional) –

Graphs#

graphein.grn.graphs.compute_grn_graph(gene_list: List[str], edge_construction_funcs: List[Callable], graph_annotation_funcs: Optional[List[Callable]] = None, node_annotation_funcs: Optional[List[Callable]] = None, edge_annotation_funcs: Optional[List[Callable]] = None, config: Optional[graphein.grn.config.GRNGraphConfig] = None) networkx.classes.graph.Graph[source]#

Computes a Gene Regulatory Network Graph from a list of gene IDs

Parameters
  • gene_list (List[str]) – List of gene identifiers

  • edge_construction_funcs (List[Callable]) – List of functions to construct edges with

  • graph_annotation_funcs (List[Callable], optional) – List of functions functools annotate graph metadata, defaults to None

  • node_annotation_funcs (List[Callable], optional) – List of functions to annotate node metadata, defaults to None

  • edge_annotation_funcs (List[Callable], optional) – List of functions to annotate edge metadata, defaults to None

  • config (graphein.grn.GRNGraphConfig, optional) – Config specifying additional parameters for STRING and BIOGRID, defaults to None

Returns

nx.Graph of PPI network

Return type

nx.Graph

graphein.grn.graphs.parse_kwargs_from_config(config: graphein.grn.config.GRNGraphConfig) graphein.grn.config.GRNGraphConfig[source]#

If configs for specific dataset are provided in the Global GRNGraphConfig, we update the kwargs

Parameters

config (graphein.grn.GRNGraphConfig) – GRN graph configuration object.

Returns

config with updated config.kwargs

Return type

graphein.grn.GRNGraphConfig

Edges#

graphein.grn.edges.add_interacting_genes(G: networkx.classes.graph.Graph, df: pandas.core.frame.DataFrame, kind: str) networkx.classes.graph.Graph[source]#

Generic function for adding interaction edges to GRNGraph

Parameters
  • G (nx.Graph) – GRNGraph to populate with edges

  • df (pd.DataFrame) – DataFrame containing edgelist

  • kind (str) – name of interaction type

Returns

Graph with edges added

Return type

nx.Graph

graphein.grn.edges.add_regnetwork_edges(G: networkx.classes.graph.Graph, regnetwork_filtering_funcs: Optional[List[Callable]] = None) networkx.classes.graph.Graph[source]#

Adds edges from RegNetwork to GRNGraph

Parameters
  • G – Graph to edges to (populated with gene_id nodes)

  • kwargs – Additional parameters to pass to RegNetwork

Returns

nx.Graph GRNGraph with RegNetwork regulatory interactions added as edges

graphein.grn.edges.add_trrust_edges(G: networkx.classes.graph.Graph, trrust_filtering_funcs: Optional[List[Callable]] = None) networkx.classes.graph.Graph[source]#

Adds edges from TRRUST to GRNGraph

Parameters
  • G (nx.Graph) – Graph to edges to (populated with gene_id nodes)

  • trrust_filtering_funcs (List[Callable], optional) – List of functions to apply to TRRUST dataframe as pre-processing prior to graph constructions. Defaults to None.

Returns

nx.Graph GRNGraph with TRRUST regulatory interactions added as edges

Return type

nx.Graph

Features#

Node Features#

graphein.grn.features.node_features.add_sequence_to_nodes(n, d)[source]#

Maps UniProt ACC to UniProt ID. Retrieves sequence from UniProt and adds it to the node

Parameters
  • n – Graph node.

  • d – Graph attribute dictionary.

Database Parsers#

RegNetwork#

graphein.grn.parse_regnetwork.RegNetwork_df(gene_list: List[str], root_dir: Optional[pathlib.Path] = None, filtering_funcs: Optional[List[Callable]] = None) pandas.core.frame.DataFrame[source]#

Generates standardised dataframe with RegNetwork protein-protein interactions, filtered according to user’s input :return: Standardised dataframe with RegNetwork interactions

graphein.grn.parse_regnetwork.filter_RegNetwork(df: pandas.core.frame.DataFrame, funcs: Optional[List[Callable]] = None) pandas.core.frame.DataFrame[source]#

Filters results of RegNetwork call by providing a list of user-defined functions that accept a dataframe and return a dataframe

Parameters
  • df – pd.Dataframe to filter

  • funcs – list of functions that carry out dataframe processing

Returns

processed dataframe

graphein.grn.parse_regnetwork.load_RegNetwork_interactions(root_dir: Optional[pathlib.Path] = None) pandas.core.frame.DataFrame[source]#

Loads RegNetwork interaction datafile. Downloads the file first if not already present.

graphein.grn.parse_regnetwork.load_RegNetwork_regulation_types(root_dir: Optional[pathlib.Path] = None) pandas.core.frame.DataFrame[source]#

Loads RegNetwork regulation types. Downloads the file first if not already present.

graphein.grn.parse_regnetwork.parse_RegNetwork(gene_list: List[str], root_dir: Optional[pathlib.Path] = None) pandas.core.frame.DataFrame[source]#

Parser for RegNetwork interactions

Parameters

gene_list – List of gene identifiers

:return Pandas dataframe with the regulatory interactions between genes in the gene list

graphein.grn.parse_regnetwork.standardise_RegNetwork(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Standardises STRING dataframe, e.g. puts everything into a common format

Parameters

df (pd.DataFrame) – Source specific Pandas dataframe

Returns

Standardised dataframe

Return type

pd.DataFrame

TRRUST#

Utilities for parsing the TRRUST database.

graphein.grn.parse_trrust.TRRUST_df(gene_list: List[str], filtering_funcs: Optional[List[Callable]] = None) pandas.core.frame.DataFrame[source]#

Generates standardised dataframe with TRRUST protein-protein interactions, filtered according to user’s input.

Parameters
  • gene_list (List[str]) –

  • filtering_funcs (List[Callable]) – Functions with which to filter the dataframe.

Returns

Standardised dataframe with TRRUST interactions

Return type

pd.DataFrame

graphein.grn.parse_trrust.filter_TRRUST(df: pandas.core.frame.DataFrame, funcs: Optional[List[Callable]]) pandas.core.frame.DataFrame[source]#

Filters results of TRRUST call according to user kwargs.

Parameters
  • df (pd.DataFrame) – Source specific Pandas dataframe (TRRUST) with results of the API call

  • funcs (List[Callable]) – User functions to filter the results.

Returns

Source specific Pandas dataframe with filtered results

Return type

pd.DataFrame

graphein.grn.parse_trrust.load_TRRUST(root_dir: Optional[pathlib.Path] = None) pandas.core.frame.DataFrame[source]#

Loads the TRRUST datafile. If file not found, it is downloaded.

Parameters

root_dir (pathlib.Path, optional) – Root directory path to either find or download TRRUST

Returns

TRRUST database as a dataframe

Return type

pd.DataFrame

graphein.grn.parse_trrust.parse_TRRUST(gene_list: List[str], root_dir: Optional[pathlib.Path] = None) pandas.core.frame.DataFrame[source]#

Parser for TRRUST regulatory interactions. If the TRRUST dataset is not found in the specified root_dir, it is downloaded

Parameters
  • gene_list (List[str]) – List of gene identifiers to restrict dataframe to.

  • root_dir (pathlib.Path, optional) – Root directory path to either find or download TRRUST. Defaults to None (downloads dataset to graphein/datasets/trrust)

Returns

Pandas dataframe with the regulatory interactions between genes in the gene list

Return type

pd.DataFrame

graphein.grn.parse_trrust.standardise_TRRUST(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]#

Filters results of TRRUST call by providing a list of user-defined functions that accept a dataframe and return a dataframe.

Parameters
  • df (pd.DataFrame) – pd.Dataframe to filter. Must contain columns: [“g1”, “g2”, “regtype”]

  • funcs (List[Callable]) – list of functions that carry out dataframe processing

Returns

processed dataframe

Return type

pd.DataFrame