graphein.utils#

Utils#

Utilities for working with graph objects.

graphein.utils.utils.annotate_edge_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) → networkx.classes.graph.Graph[source]#

Annotates Graph edges with edge metadata. Each function in funcs must take the three arguments u, v and d, where u and v are the nodes of the edge, and d is the edge data dictionary.

Additional parameters can be provided by using partial functions.

Parameters

G (nx.Graph) – Graph to add edge metadata to
funcs (List[Callable]) – List of edge metadata annotation functions

Returns

Graph with edge metadata added

Return type

nx.Graph

graphein.utils.utils.annotate_graph_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) → networkx.classes.graph.Graph[source]#

Annotates graph with graph-level metadata

Parameters

G (nx.Graph) – Graph on which to add graph-level metadata to
funcs (List[Callable]) – List of graph metadata annotation functions

Returns

Graph on which with node metadata added

Return type

nx.Graph

graphein.utils.utils.annotate_node_features(G: networkx.classes.graph.Graph, funcs: List[Callable]) → networkx.classes.graph.Graph[source]#

Annotates nodes with features data. Note: passes whole graph to function.

Parameters

G (nx.Graph) – Graph to add node features to
funcs (List[Callable]) – List of node feature annotation functions

Returns

Graph with node features added

Return type

nx.Graph

graphein.utils.utils.annotate_node_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) → networkx.classes.graph.Graph[source]#

Annotates nodes with metadata. Each function in funcs must take two arguments n and d, where n is the node and d is the node data dictionary.

Additional parameters can be provided by using partial functions.

Parameters

G (nx.Graph) – Graph to add node metadata to
funcs (List[Callable]) – List of node metadata annotation functions

Returns

Graph with node metadata added

Return type

nx.Graph

graphein.utils.utils.compute_edges(G: networkx.classes.graph.Graph, funcs: List[Callable]) → networkx.classes.graph.Graph[source]#

Computes edges for an Graph from a list of edge construction functions. Each func in funcs must take an nx.Graph and return an nx.Graph.

Parameters

G (nx.Graph) – Graph to add features to
funcs (List[Callable]) – List of edge construction functions

Returns

Graph with edges added

Return type

nx.Graph

graphein.utils.utils.filter_dataframe(df: pandas.core.frame.DataFrame, funcs: List[Callable]) → pandas.core.frame.DataFrame[source]#

Applies transformation functions to a dataframe. Each function in funcs must accept a pd.DataFrame and return a pd.DataFrame.

Additional parameters can be provided by using partial functions.

Parameters

df (pd.DataFrame) – Dataframe to apply transformations to.
funcs (List[Callable]) – List of transformation functions.

Return type

nx.Graph

graphein.utils.utils.format_adjacency(G: networkx.classes.graph.Graph, adj: numpy.ndarray, name: str) → xarray.core.dataarray.DataArray[source]#

Format adjacency matrix nicely.

Intended to be used when computing an adjacency-like matrix of a graph object G. For example, in defining a func:

def my_adj_matrix_func(G):
    adj = some_adj_func(G)
    return format_adjacency(G, adj, "xarray_coord_name")

Assumptions

adj should be a 2D matrix of shape (n_nodes, n_nodes)

#. name is something that is unique amongst all names used in the final adjacency tensor.

Parameters

G – NetworkX-compatible Graph
adj (np.ndarray) – 2D numpy array of shape (n_nodes, n_nodes)
name (str) – A unique name for the kind of adjacency matrix being constructed. Gets used in xarray as a coordinate in the "name" dimension.

Returns

An XArray DataArray of shape (n_nodes, n_nodes, 1)

Return type

xr.DataArray

graphein.utils.utils.generate_adjacency_tensor(G: networkx.classes.graph.Graph, funcs: List[Callable], return_array=False) → xarray.core.dataarray.DataArray[source]#

Generate adjacency tensor for a graph.

Uses the collection of functions in funcs to build an xarray DataArray that houses the resulting “adjacency tensor”.

A key design choice: We default to returning xarray DataArrays, to make inspecting the data easy, but for consumption in tensor libraries, you can turn on returning a NumPy array by switching return_array=True.

Parameters

G (nx.Graph) – NetworkX Graph.
funcs (List[Callable]) – A list of functions that take in G and return an xr.DataArray

Returns

xr.DataArray, which is of shape (n_nodes, n_nodes, n_funcs).

Return type

xr.DataArray

graphein.utils.utils.generate_feature_dataframe(G: networkx.classes.graph.Graph, funcs: List[Callable], return_array=False) → pandas.core.frame.DataFrame[source]#

Return a pandas DataFrame representation of node metadata.

funcs has to be list of callables whose signature is

f(n, d) -> pd.Series

where n is the graph node, d is the node metadata dictionary. The function must return a pandas Series whose name is the node.

Example function:

def x_vec(n: Hashable, d: Dict[Hashable, Any]) -> pd.Series:
    return pd.Series({"x_coord": d["x_coord"]}, name=n)

One fairly strong assumption is that each func has all the information it needs to act stored on the metadata dictionary. If you need to reference an external piece of information, such as a dictionary to look up values, set up the function to accept the dictionary, and use functools.partial to “reduce” the function signature to just (n, d). An example below:

from functools import partial
def get_molweight(n, d, mw_dict):
    return pd.Series({"mw": mw_dict[d["amino_acid"]]}, name=n)

mw_dict = {"PHE": 165, "GLY": 75, ...}
get_molweight_func = partial(get_molweight, mw_dict=mw_dict)

generate_feature_dataframe(G, [get_molweight_func])

The name=n piece is important; the name becomes the row index in the resulting dataframe.

The series that is returned from each function need not only contain one key-value pair. You can have two or more, and that’s completely fine; each key becomes a column in the resulting dataframe.

A key design choice: We default to returning DataFrames, to make inspecting the data easy, but for consumption in tensor libraries, you can turn on returning a NumPy array by switching return_array=True.

Parameters

G (nx.Graph) – A NetworkX-compatible graph object.
funcs (List[Callable]) – A list of functions.
return_array (bool) – Whether or not to return a NumPy array version of the data. Useful for consumption in tensor libs, like PyTorch or JAX.

Returns

pandas DataFrame representation of node metadata.

Return type

pd.DataFrame

graphein.utils.utils.import_message(submodule: str, package: str, conda_channel: Optional[str] = None, pip_install: bool = False) → str[source]#

Return warning if package is not found. Generic message for indicating to the user when a function relies on an optional module / package that is not currently installed. Includes installation instructions. Typically used in conjunction without optional featurisation libraries

Parameters

submodule (str) – graphein submodule that needs an external dependency.
package (str) – External package this submodule relies on.
conda_channel (str, optional) – Conda channel package can be installed from, if at all. Defaults to None
pip_install (bool) – Whether package can be installed via pip. Defaults to False

graphein.utils.utils.onek_encoding_unk(x: Iterable[Any], allowable_set: List[Any]) → List[bool][source]#

Function for perfroming one hot encoding

Parameters

x (Iterable[Any]) – values to one-hot
allowable_set (List[Any]) – set of options to encode

Returns

one-hot encoding as list

Return type

List[bool]

graphein.utils.utils.ping(host: str) → bool[source]#

Returns True if host (str) responds to a ping request. Remember that a host may not respond to a ping (ICMP) request even if the host name is valid.

Parameters: host (str) – IP or hostname
Returns: True if host responds to a ping request.
Return type: bool

graphein.utils.utils.protein_letters_3to1_all_caps(amino_acid: str) → str[source]#

Converts capitalised 3 letter amino acid code to single letter. Not provided in default biopython.

Parameters: amino_acid (str) – Capitalised 3-letter amino acid code (eg. "GLY")
Returns: Single-letter amino acid code
Return type: str

Testing utilities for the Graphein library.

graphein.testing.utils.compare_approximate(first, second)[source]#

Return whether two dicts of arrays are approximates equal.

Parameters

first (Dict[str, Any]) – The first dictionary.
second (Dict[str, Any]) – The second dictionary.

Returns

True if the dictionaries are approx equal, False otherwise.

Return type

bool

graphein.testing.utils.compare_exact(first: Dict[str, Any], second: Dict[str, Any]) → bool[source]#

Return whether two dicts of arrays are exactly equal.

Parameters

first (Dict[str, Any]) – The first dictionary.
second (Dict[str, Any]) – The second dictionary.

Returns

True if the dictionaries are exactly equal, False otherwise.

Return type

bool

graphein.testing.utils.edge_data_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph, comparison_func: typing.Callable = <function compare_exact>) → bool[source]#

Checks whether two graphs have the same edge features.

Parameters

g (networkx.Graph) – The first graph.
h (networkx.Graph) – The second graph.
comparison_func – Matching function for edge features. Takes two edge feature dictionaries and returns True if they are equal. Defaults to compare_exact()

Returns

True if the graphs have the same node features, False otherwise.

Return type

bool

graphein.testing.utils.edges_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) → bool[source]#

Checks whether two graphs have the same edges.

Parameters

g (networkx.Graph) – The first graph.
h (networkx.Graph) – The second graph.

Raises

AssertionError – If the graphs do not contain the same nodes

graphein.testing.utils.graphs_isomorphic(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) → bool[source]#

Checks for structural isomorphism between two graphs: g and h.

Parameters

g (networkx.Graph) – The first graph.
h (networkx.Graph) – The second graph.

Returns

True if the graphs are isomorphic, False otherwise.

Return type

bool

graphein.testing.utils.nodes_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) → bool[source]#

Checks whether two graphs have the same nodes.

Parameters

g (networkx.Graph) – The first graph.
h (networkx.Graph) – The second graph.

Raises

AssertionError – If the graphs do not contain the same nodes

CLI & Config#

Yaml parser for config objects

graphein.utils.config_parser.config_constructor(loader: yaml.loader.FullLoader, node: yaml.nodes.MappingNode) → pydantic.main.BaseModel[source]#

Construct a BaseModel config.

Parameters

loader – Given yaml loader
type – yaml.FullLoader
loader – A mapping node
type – yaml.nodes.MappingNode

graphein.utils.config_parser.function_constructor(loader: yaml.loader.FullLoader, tag_suffix: str, node: Union[yaml.nodes.MappingNode, yaml.nodes.ScalarNode]) → Callable[source]#

Construct a Callable. If function parameters are given, this returns a partial function.

Parameters

loader – Given yaml loader
type – yaml.FullLoader
tag_suffix – The name after the !func: tag
type – str
loader – A mapping node if function parameters are given, a scalar node if not
type – Union[yaml.nodes.MappingNode, yaml.nodes.ScalarNode]

graphein.utils.config_parser.get_loader() → yaml.loader.Loader[source]#: Add constructors to PyYAML loader.

graphein.utils.config_parser.parse_config(path: pathlib.Path) → pydantic.main.BaseModel[source]#

Parses a yaml configuration file into a config object.

Parameters: path (pathlib.Path) – Path to configuration file

Yaml parser for config objects

class graphein.utils.config.PartialMatchOperator(regex_paths=None, types=None)[source]#: Custom operator for deepdiff comparison. This operator compares whether the two partials are equal.

class graphein.utils.config.PathMatchOperator(regex_paths=None, types=None)[source]#: Custom operator for deepdiff comparison. This operator compares whether the two pathlib Paths are equal.

graphein.utils.config.partial_functions_equal(func1: functools.partial, func2: functools.partial) → bool[source]#

Determine whether two partial functions are equal.

Parameters

func1 (partial) – Partial function to check
func2 (partial) – Partial function to check

Returns

Whether the two functions are equal

Return type

bool

graphein.utils#

Utils#

CLI & Config#

PyMol#