graphein.utils#
Utils#
Utilities for working with graph objects.
- graphein.utils.utils.annotate_edge_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph [source]#
Annotates Graph edges with edge metadata. Each function in
funcs
must take the three argumentsu
,v
andd
, whereu
andv
are the nodes of the edge, andd
is the edge data dictionary.Additional parameters can be provided by using partial functions.
- Parameters
G (nx.Graph) – Graph to add edge metadata to
funcs (List[Callable]) – List of edge metadata annotation functions
- Returns
Graph with edge metadata added
- Return type
nx.Graph
- graphein.utils.utils.annotate_graph_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph [source]#
Annotates graph with graph-level metadata
- Parameters
G (nx.Graph) – Graph on which to add graph-level metadata to
funcs (List[Callable]) – List of graph metadata annotation functions
- Returns
Graph on which with node metadata added
- Return type
nx.Graph
- graphein.utils.utils.annotate_node_features(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph [source]#
Annotates nodes with features data. Note: passes whole graph to function.
- Parameters
G (nx.Graph) – Graph to add node features to
funcs (List[Callable]) – List of node feature annotation functions
- Returns
Graph with node features added
- Return type
nx.Graph
- graphein.utils.utils.annotate_node_metadata(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph [source]#
Annotates nodes with metadata. Each function in
funcs
must take two argumentsn
andd
, wheren
is the node andd
is the node data dictionary.Additional parameters can be provided by using partial functions.
- Parameters
G (nx.Graph) – Graph to add node metadata to
funcs (List[Callable]) – List of node metadata annotation functions
- Returns
Graph with node metadata added
- Return type
nx.Graph
- graphein.utils.utils.compute_edges(G: networkx.classes.graph.Graph, funcs: List[Callable]) networkx.classes.graph.Graph [source]#
Computes edges for an Graph from a list of edge construction functions. Each func in
funcs
must take annx.Graph
and return annx.Graph
.- Parameters
G (nx.Graph) – Graph to add features to
funcs (List[Callable]) – List of edge construction functions
- Returns
Graph with edges added
- Return type
nx.Graph
- graphein.utils.utils.filter_dataframe(df: pandas.core.frame.DataFrame, funcs: List[Callable]) pandas.core.frame.DataFrame [source]#
Applies transformation functions to a dataframe. Each function in
funcs
must accept apd.DataFrame
and return apd.DataFrame
.Additional parameters can be provided by using partial functions.
- Parameters
df (pd.DataFrame) – Dataframe to apply transformations to.
funcs (List[Callable]) – List of transformation functions.
- Return type
nx.Graph
- graphein.utils.utils.format_adjacency(G: networkx.classes.graph.Graph, adj: numpy.ndarray, name: str) xarray.core.dataarray.DataArray [source]#
Format adjacency matrix nicely.
Intended to be used when computing an adjacency-like matrix of a graph object
G
. For example, in defining a func:def my_adj_matrix_func(G): adj = some_adj_func(G) return format_adjacency(G, adj, "xarray_coord_name")
Assumptions
adj
should be a 2D matrix of shape(n_nodes, n_nodes)
#.
name
is something that is unique amongst all names used in the final adjacency tensor.- Parameters
G – NetworkX-compatible Graph
adj (np.ndarray) – 2D numpy array of shape
(n_nodes, n_nodes)
name (str) – A unique name for the kind of adjacency matrix being constructed. Gets used in xarray as a coordinate in the
"name"
dimension.
- Returns
An XArray DataArray of shape
(n_nodes, n_nodes, 1)
- Return type
xr.DataArray
- graphein.utils.utils.generate_adjacency_tensor(G: networkx.classes.graph.Graph, funcs: List[Callable], return_array=False) xarray.core.dataarray.DataArray [source]#
Generate adjacency tensor for a graph.
Uses the collection of functions in
funcs
to build an xarray DataArray that houses the resulting “adjacency tensor”.A key design choice: We default to returning xarray DataArrays, to make inspecting the data easy, but for consumption in tensor libraries, you can turn on returning a NumPy array by switching
return_array=True
.- Parameters
G (nx.Graph) – NetworkX Graph.
funcs (List[Callable]) – A list of functions that take in G and return an xr.DataArray
- Returns
xr.DataArray, which is of shape
(n_nodes, n_nodes, n_funcs)
.- Return type
xr.DataArray
- graphein.utils.utils.generate_feature_dataframe(G: networkx.classes.graph.Graph, funcs: List[Callable], return_array=False) pandas.core.frame.DataFrame [source]#
Return a pandas DataFrame representation of node metadata.
funcs
has to be list of callables whose signature isf(n, d) -> pd.Series
where
n
is the graph node,d
is the node metadata dictionary. The function must return a pandas Series whose name is the node.Example function:
def x_vec(n: Hashable, d: Dict[Hashable, Any]) -> pd.Series: return pd.Series({"x_coord": d["x_coord"]}, name=n)
One fairly strong assumption is that each func has all the information it needs to act stored on the metadata dictionary. If you need to reference an external piece of information, such as a dictionary to look up values, set up the function to accept the dictionary, and use
functools.partial
to “reduce” the function signature to just(n, d)
. An example below:from functools import partial def get_molweight(n, d, mw_dict): return pd.Series({"mw": mw_dict[d["amino_acid"]]}, name=n) mw_dict = {"PHE": 165, "GLY": 75, ...} get_molweight_func = partial(get_molweight, mw_dict=mw_dict) generate_feature_dataframe(G, [get_molweight_func])
The
name=n
piece is important; thename
becomes the row index in the resulting dataframe.The series that is returned from each function need not only contain one key-value pair. You can have two or more, and that’s completely fine; each key becomes a column in the resulting dataframe.
A key design choice: We default to returning DataFrames, to make inspecting the data easy, but for consumption in tensor libraries, you can turn on returning a NumPy array by switching
return_array=True
.- Parameters
G (nx.Graph) – A NetworkX-compatible graph object.
funcs (List[Callable]) – A list of functions.
return_array (bool) – Whether or not to return a NumPy array version of the data. Useful for consumption in tensor libs, like PyTorch or JAX.
- Returns
pandas DataFrame representation of node metadata.
- Return type
pd.DataFrame
- graphein.utils.utils.import_message(submodule: str, package: str, conda_channel: Optional[str] = None, pip_install: bool = False) str [source]#
Return warning if package is not found. Generic message for indicating to the user when a function relies on an optional module / package that is not currently installed. Includes installation instructions. Typically used in conjunction without optional featurisation libraries
- Parameters
submodule (str) – graphein submodule that needs an external dependency.
package (str) – External package this submodule relies on.
conda_channel (str, optional) – Conda channel package can be installed from, if at all. Defaults to None
pip_install (bool) – Whether package can be installed via pip. Defaults to False
- graphein.utils.utils.onek_encoding_unk(x: Iterable[Any], allowable_set: List[Any]) List[bool] [source]#
Function for perfroming one hot encoding
- Parameters
x (Iterable[Any]) – values to one-hot
allowable_set (List[Any]) – set of options to encode
- Returns
one-hot encoding as list
- Return type
List[bool]
- graphein.utils.utils.ping(host: str) bool [source]#
Returns
True
if host (str) responds to a ping request. Remember that a host may not respond to a ping (ICMP) request even if the host name is valid.
- graphein.utils.utils.protein_letters_3to1_all_caps(amino_acid: str) str [source]#
Converts capitalised 3 letter amino acid code to single letter. Not provided in default biopython.
Testing utilities for the Graphein library.
- graphein.testing.utils.compare_approximate(first, second)[source]#
Return whether two dicts of arrays are approximates equal.
- graphein.testing.utils.compare_exact(first: Dict[str, Any], second: Dict[str, Any]) bool [source]#
Return whether two dicts of arrays are exactly equal.
- graphein.testing.utils.edge_data_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph, comparison_func: typing.Callable = <function compare_exact>) bool [source]#
Checks whether two graphs have the same edge features.
- Parameters
g (
networkx.Graph
) – The first graph.h (
networkx.Graph
) – The second graph.comparison_func – Matching function for edge features. Takes two edge feature dictionaries and returns
True
if they are equal. Defaults tocompare_exact()
- Returns
True
if the graphs have the same node features,False
otherwise.- Return type
- graphein.testing.utils.edges_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) bool [source]#
Checks whether two graphs have the same edges.
- Parameters
g (
networkx.Graph
) – The first graph.h (
networkx.Graph
) – The second graph.
- Raises
AssertionError – If the graphs do not contain the same nodes
- graphein.testing.utils.graphs_isomorphic(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) bool [source]#
Checks for structural isomorphism between two graphs:
g
andh
.- Parameters
g (
networkx.Graph
) – The first graph.h (
networkx.Graph
) – The second graph.
- Returns
True
if the graphs are isomorphic,False
otherwise.- Return type
- graphein.testing.utils.nodes_equal(g: networkx.classes.graph.Graph, h: networkx.classes.graph.Graph) bool [source]#
Checks whether two graphs have the same nodes.
- Parameters
g (
networkx.Graph
) – The first graph.h (
networkx.Graph
) – The second graph.
- Raises
AssertionError – If the graphs do not contain the same nodes
CLI & Config#
Yaml parser for config objects
- graphein.utils.config_parser.config_constructor(loader: yaml.loader.FullLoader, node: yaml.nodes.MappingNode) pydantic.main.BaseModel [source]#
Construct a BaseModel config.
- Parameters
loader – Given yaml loader
type – yaml.FullLoader
loader – A mapping node
type – yaml.nodes.MappingNode
- graphein.utils.config_parser.function_constructor(loader: yaml.loader.FullLoader, tag_suffix: str, node: Union[yaml.nodes.MappingNode, yaml.nodes.ScalarNode]) Callable [source]#
Construct a Callable. If function parameters are given, this returns a partial function.
- Parameters
loader – Given yaml loader
type – yaml.FullLoader
tag_suffix – The name after the !func: tag
type – str
loader – A mapping node if function parameters are given, a scalar node if not
type – Union[yaml.nodes.MappingNode, yaml.nodes.ScalarNode]
- graphein.utils.config_parser.get_loader() yaml.loader.Loader [source]#
Add constructors to PyYAML loader.
- graphein.utils.config_parser.parse_config(path: pathlib.Path) pydantic.main.BaseModel [source]#
Parses a yaml configuration file into a config object.
- Parameters
path (pathlib.Path) – Path to configuration file
Yaml parser for config objects
- class graphein.utils.config.PartialMatchOperator(regex_paths=None, types=None)[source]#
Custom operator for deepdiff comparison. This operator compares whether the two partials are equal.
- class graphein.utils.config.PathMatchOperator(regex_paths=None, types=None)[source]#
Custom operator for deepdiff comparison. This operator compares whether the two pathlib Paths are equal.
- graphein.utils.config.partial_functions_equal(func1: functools.partial, func2: functools.partial) bool [source]#
Determine whether two partial functions are equal.
- Parameters
func1 (partial) – Partial function to check
func2 (partial) – Partial function to check
- Returns
Whether the two functions are equal
- Return type