graphein.utils#

Utils#

Utilities for working with graph objects.

graphein.utils.utils.annotate_edge_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#

Annotates Graph edges with edge metadata. Each function in funcs must take the three arguments u, v and d, where u and v are the nodes of the edge, and d is the edge data dictionary.

Additional parameters can be provided by using partial functions.

Parameters
  • G (nx.Graph) – Graph to add edge metadata to

  • funcs (List[Callable]) – List of edge metadata annotation functions

Returns

Graph with edge metadata added

Return type

nx.Graph

graphein.utils.utils.annotate_graph_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#

Annotates graph with graph-level metadata

Parameters
  • G (nx.Graph) – Graph on which to add graph-level metadata to

  • funcs (List[Callable]) – List of graph metadata annotation functions

Returns

Graph on which with node metadata added

Return type

nx.Graph

graphein.utils.utils.annotate_node_features(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#

Annotates nodes with features data. Note: passes whole graph to function.

Parameters
  • G (nx.Graph) – Graph to add node features to

  • funcs (List[Callable]) – List of node feature annotation functions

Returns

Graph with node features added

Return type

nx.Graph

graphein.utils.utils.annotate_node_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#

Annotates nodes with metadata. Each function in funcs must take two arguments n and d, where n is the node and d is the node data dictionary.

Additional parameters can be provided by using partial functions.

Parameters
  • G (nx.Graph) – Graph to add node metadata to

  • funcs (List[Callable]) – List of node metadata annotation functions

Returns

Graph with node metadata added

Return type

nx.Graph

graphein.utils.utils.compute_edges(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph[source]#

Computes edges for an Graph from a list of edge construction functions. Each func in funcs must take an nx.Graph and return an nx.Graph.

Parameters
  • G (nx.Graph) – Graph to add features to

  • funcs (List[Callable]) – List of edge construction functions

Returns

Graph with edges added

Return type

nx.Graph

graphein.utils.utils.filter_dataframe(df: pandas.core.frame.DataFrame, funcs: List[Callable]) pandas.core.frame.DataFrame[source]#

Applies transformation functions to a dataframe. Each function in funcs must accept a pd.DataFrame and return a pd.DataFrame.

Additional parameters can be provided by using partial functions.

Parameters
  • df (pd.DataFrame) – Dataframe to apply transformations to.

  • funcs (List[Callable]) – List of transformation functions.

Return type

nx.Graph

graphein.utils.utils.format_adjacency(G: networkx.classes.graph.Graph, adj: numpy.ndarray, name: str) xarray.core.dataarray.DataArray[source]#

Format adjacency matrix nicely.

Intended to be used when computing an adjacency-like matrix of a graph object G. For example, in defining a func:

def my_adj_matrix_func(G):
    adj = some_adj_func(G)
    return format_adjacency(G, adj, "xarray_coord_name")

Assumptions

  1. adj should be a 2D matrix of shape (n_nodes, n_nodes)

#. name is something that is unique amongst all names used in the final adjacency tensor.

Parameters
  • G – NetworkX-compatible Graph

  • adj (np.ndarray) – 2D numpy array of shape (n_nodes, n_nodes)

  • name (str) – A unique name for the kind of adjacency matrix being constructed. Gets used in xarray as a coordinate in the "name" dimension.

Returns

An XArray DataArray of shape (n_nodes, n_nodes, 1)

Return type

xr.DataArray

graphein.utils.utils.generate_adjacency_tensor(G: networkx.classes.graph.Graph, funcs: List[Callable], return_array=False) xarray.core.dataarray.DataArray[source]#

Generate adjacency tensor for a graph.

Uses the collection of functions in funcs to build an xarray DataArray that houses the resulting “adjacency tensor”.

A key design choice: We default to returning xarray DataArrays, to make inspecting the data easy, but for consumption in tensor libraries, you can turn on returning a NumPy array by switching return_array=True.

Parameters
  • G (nx.Graph) – NetworkX Graph.

  • funcs (List[Callable]) – A list of functions that take in G and return an xr.DataArray

Returns

xr.DataArray, which is of shape (n_nodes, n_nodes, n_funcs).

Return type

xr.DataArray

graphein.utils.utils.generate_feature_dataframe(G: networkx.classes.graph.Graph, funcs: List[Callable], return_array=False) pandas.core.frame.DataFrame[source]#

Return a pandas DataFrame representation of node metadata.

funcs has to be list of callables whose signature is

f(n, d) -> pd.Series

where n is the graph node, d is the node metadata dictionary. The function must return a pandas Series whose name is the node.

Example function:

def x_vec(n: Hashable, d: Dict[Hashable, Any]) -> pd.Series:
    return pd.Series({"x_coord": d["x_coord"]}, name=n)

One fairly strong assumption is that each func has all the information it needs to act stored on the metadata dictionary. If you need to reference an external piece of information, such as a dictionary to look up values, set up the function to accept the dictionary, and use functools.partial to “reduce” the function signature to just (n, d). An example below:

from functools import partial
def get_molweight(n, d, mw_dict):
    return pd.Series({"mw": mw_dict[d["amino_acid"]]}, name=n)

mw_dict = {"PHE": 165, "GLY": 75, ...}
get_molweight_func = partial(get_molweight, mw_dict=mw_dict)

generate_feature_dataframe(G, [get_molweight_func])

The name=n piece is important; the name becomes the row index in the resulting dataframe.

The series that is returned from each function need not only contain one key-value pair. You can have two or more, and that’s completely fine; each key becomes a column in the resulting dataframe.

A key design choice: We default to returning DataFrames, to make inspecting the data easy, but for consumption in tensor libraries, you can turn on returning a NumPy array by switching return_array=True.

Parameters
  • G (nx.Graph) – A NetworkX-compatible graph object.

  • funcs (List[Callable]) – A list of functions.

  • return_array (bool) – Whether or not to return a NumPy array version of the data. Useful for consumption in tensor libs, like PyTorch or JAX.

Returns

pandas DataFrame representation of node metadata.

Return type

pd.DataFrame

graphein.utils.utils.import_message(submodule: str, package: str, conda_channel: Optional[str] = None, pip_install: bool = False) str[source]#

Return warning if package is not found. Generic message for indicating to the user when a function relies on an optional module / package that is not currently installed. Includes installation instructions. Typically used in conjunction without optional featurisation libraries

Parameters
  • submodule (str) – graphein submodule that needs an external dependency.

  • package (str) – External package this submodule relies on.

  • conda_channel (str, optional) – Conda channel package can be installed from, if at all. Defaults to None

  • pip_install (bool) – Whether package can be installed via pip. Defaults to False

graphein.utils.utils.onek_encoding_unk(x: Iterable[Any], allowable_set: List[Any]) List[bool][source]#

Function for perfroming one hot encoding

Parameters
  • x (Iterable[Any]) – values to one-hot

  • allowable_set (List[Any]) – set of options to encode

Returns

one-hot encoding as list

Return type

List[bool]

graphein.utils.utils.ping(host: str) bool[source]#

Returns True if host (str) responds to a ping request. Remember that a host may not respond to a ping (ICMP) request even if the host name is valid.

Parameters

host (str) – IP or hostname

Returns

True if host responds to a ping request.

Return type

bool

graphein.utils.utils.protein_letters_3to1_all_caps(amino_acid: str) str[source]#

Converts capitalised 3 letter amino acid code to single letter. Not provided in default biopython.

Parameters

amino_acid (str) – Capitalised 3-letter amino acid code (eg. "GLY")

Returns

Single-letter amino acid code

Return type

str

Testing utilities for the Graphein library.

graphein.testing.utils.compare_approximate(first, second)[source]#

Return whether two dicts of arrays are approximates equal.

Parameters
  • first (Dict[str, Any]) – The first dictionary.

  • second (Dict[str, Any]) – The second dictionary.

Returns

True if the dictionaries are approx equal, False otherwise.

Return type

bool

graphein.testing.utils.compare_exact(first: Dict[str, Any], second: Dict[str, Any]) bool[source]#

Return whether two dicts of arrays are exactly equal.

Parameters
  • first (Dict[str, Any]) – The first dictionary.

  • second (Dict[str, Any]) – The second dictionary.

Returns

True if the dictionaries are exactly equal, False otherwise.

Return type

bool

graphein.testing.utils.edge_data_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph, comparison_func: typing.Callable = <function compare_exact>) bool[source]#

Checks whether two graphs have the same edge features.

Parameters
  • g (networkx.Graph) – The first graph.

  • h (networkx.Graph) – The second graph.

  • comparison_func – Matching function for edge features. Takes two edge feature dictionaries and returns True if they are equal. Defaults to compare_exact()

Returns

True if the graphs have the same node features, False otherwise.

Return type

bool

graphein.testing.utils.edges_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) bool[source]#

Checks whether two graphs have the same edges.

Parameters
  • g (networkx.Graph) – The first graph.

  • h (networkx.Graph) – The second graph.

Raises

AssertionError – If the graphs do not contain the same nodes

graphein.testing.utils.graphs_isomorphic(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) bool[source]#

Checks for structural isomorphism between two graphs: g and h.

Parameters
  • g (networkx.Graph) – The first graph.

  • h (networkx.Graph) – The second graph.

Returns

True if the graphs are isomorphic, False otherwise.

Return type

bool

graphein.testing.utils.nodes_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) bool[source]#

Checks whether two graphs have the same nodes.

Parameters
  • g (networkx.Graph) – The first graph.

  • h (networkx.Graph) – The second graph.

Raises

AssertionError – If the graphs do not contain the same nodes

CLI & Config#

Yaml parser for config objects

graphein.utils.config_parser.config_constructor(loader: yaml.loader.FullLoader, node: yaml.nodes.MappingNode) pydantic.main.BaseModel[source]#

Construct a BaseModel config.

Parameters
  • loader – Given yaml loader

  • type – yaml.FullLoader

  • loader – A mapping node

  • type – yaml.nodes.MappingNode

graphein.utils.config_parser.function_constructor(loader: yaml.loader.FullLoader, tag_suffix: str, node: Union[yaml.nodes.MappingNode, yaml.nodes.ScalarNode]) Callable[source]#

Construct a Callable. If function parameters are given, this returns a partial function.

Parameters
  • loader – Given yaml loader

  • type – yaml.FullLoader

  • tag_suffix – The name after the !func: tag

  • type – str

  • loader – A mapping node if function parameters are given, a scalar node if not

  • type – Union[yaml.nodes.MappingNode, yaml.nodes.ScalarNode]

graphein.utils.config_parser.get_loader() yaml.loader.Loader[source]#

Add constructors to PyYAML loader.

graphein.utils.config_parser.parse_config(path: pathlib.Path) pydantic.main.BaseModel[source]#

Parses a yaml configuration file into a config object.

Parameters

path (pathlib.Path) – Path to configuration file

Yaml parser for config objects

class graphein.utils.config.PartialMatchOperator(regex_paths=None, types=None)[source]#

Custom operator for deepdiff comparison. This operator compares whether the two partials are equal.

class graphein.utils.config.PathMatchOperator(regex_paths=None, types=None)[source]#

Custom operator for deepdiff comparison. This operator compares whether the two pathlib Paths are equal.

graphein.utils.config.partial_functions_equal(func1: functools.partial, func2: functools.partial) bool[source]#

Determine whether two partial functions are equal.

Parameters
  • func1 (partial) – Partial function to check

  • func2 (partial) – Partial function to check

Returns

Whether the two functions are equal

Return type

bool

PyMol#