graphein.rna#

Graphs#

Functions for working with RNA Secondary Structure Graphs.

graphein.rna.graphs.construct_rna_graph(dotbracket: Optional[str], sequence: Optional[str], edge_construction_funcs: List[Callable], edge_annotation_funcs: Optional[List[Callable]] = None, node_annotation_funcs: Optional[List[Callable]] = None, graph_annotation_funcs: Optional[List[Callable]] = None) networkx.classes.graph.Graph[source]#

Constructs an RNA secondary structure graph from dotbracket notation.

Parameters
  • dotbracket (str, optional) – Dotbracket notation representation of secondary structure

  • sequence (str, optional) – Corresponding sequence RNA bases

  • edge_construction_funcs (List[Callable], optional) – List of edge construction functions. Defaults to None.

  • edge_annotation_funcs (List[Callable], optional) – List of edge metadata annotation functions. Defaults to None.

  • node_annotation_funcs (List[Callable], optional) – List of node metadata annotation functions. Defaults to None.

  • graph_annotation_funcs (List[Callable], optional) – List of graph metadata annotation functions. Defaults to None.

Returns

nx.Graph of RNA secondary structure

Return type

nx.Graph

graphein.rna.graphs.validate_dotbracket(db: str)[source]#

Sanitize dotbracket string. This ensures that it only has supported symbols.

See: SUPPORTED_DOTBRACKET_NOTATION

Parameters

db (str) – Dotbracket notation string

Raises

ValueError – Raises ValueError if dotbracket notation contains unsupported symbols

graphein.rna.graphs.validate_lengths(db: str, seq: str) None[source]#

Check lengths of dotbracket and sequence match.

Parameters
  • db (str) – Dotbracket string to check

  • seq (str) – RNA nucleotide sequence to check.

Raises

ValueError – Raises ValueError if lengths of dotbracket and sequence do not match.

graphein.rna.graphs.validate_rna_sequence(s: str) None[source]#

Validate RNA sequence. This ensures that it only containts supported bases.

Supported bases are: "A", "U", "G", "C", "I" Supported bases can be accessed in RNA_BASES

Parameters

s (str) – Sequence to validate

Raises

ValueError – Raises ValueError if the sequence contains an unsupported base character

Edges#

Functions to compute edges for an RNA secondary structure graph.

graphein.rna.edges.add_all_dotbracket_edges(G: networkx.classes.graph.Graph) networkx.classes.graph.Graph[source]#

Adds phosphodiester bonds between adjacent nucleotides and base_pairing interactions to an RNA secondary structure graph.

Parameters

G (nx.Graph) – RNA Graph to add edges to.

Returns

RNA graph with phosphodiester_bond and base_pairing edges added.

Return type

nx.Graph

graphein.rna.edges.add_base_pairing_interactions(G: networkx.classes.graph.Graph) networkx.classes.graph.Graph[source]#

Adds base pairing interactions between nucleotides to an RNA secondary structure graph.

Parameters

G (nx.Graph) – RNA Graph to add edges to.

Raises

ValueError – if dotbracket contains an unsupported character.

Returns

RNA graph with base_pairing edges added.

Return type

nx.Graph

graphein.rna.edges.add_phosphodiester_bonds(G: networkx.classes.graph.Graph) networkx.classes.graph.Graph[source]#

Adds phosphodiester bonds between adjacent nucleotides to an RNA secondary structure graph.

Parameters

G (nx.Graph) – RNA Graph to add edges to.

Returns

RNA graph with phosphodiester_bond edges added.

Return type

nx.Graph

graphein.rna.edges.add_pseudoknots(G: networkx.classes.graph.Graph) networkx.classes.graph.Graph[source]#

Adds pseudoknots nucleotides to an RNA secondary structure graph.

Parameters

G (nx.Graph) – RNA Graph to add edges to.

Returns

RNA graph with pseudoknot edges added.

Return type

nx.Graph

graphein.rna.edges.check_base_pairing_type(base_1: str, base_2: str) str[source]#

Checks type and validity of base pairing interactions.

Parameters
  • base_1 (str) – str RNA Base letter for base 1.

  • base_2 (str) – str RNA base letter for base 2.

Returns

string referencing the type of base pairing "canonical", "wobble" or "invalid".

Return type

str

Visualisation#

Visualisation utilities for RNA Secondary Structure Graphs.

graphein.rna.visualisation.plot_rna_graph(g: networkx.classes.graph.Graph, layout: <module 'networkx.drawing.layout' from '/Users/arianjamasb/opt/anaconda3/envs/graphein-wip/lib/python3.8/site-packages/networkx/drawing/layout.py'> = <function circular_layout>, label_base_type: bool = True, label_base_position: bool = False, label_dotbracket_symbol: bool = False, **kwargs)[source]#

Plots a RNA Secondary Structure Graph. Colours edges by kind.

Parameters
  • g (nx.Graph) – NetworkX graph of RNA secondary structure graph.

  • layout (nx.layout) – Layout algorithm to use. Default is circular_layout.

  • label_base_type (bool) – Whether to label the base type of each base.

  • label_base_position (bool) – Whether to label the base position of each base.

  • label_dotbracket_symbol – Whether to label the dotbracket symbol of each base.

Constants#

Constants for working with RNA Secondary Structure Graphs.

graphein.rna.constants.CANONICAL_BASE_PAIRINGS: Dict[str, List[str]] = {'A': ['U'], 'C': ['G'], 'G': ['C'], 'U': ['A']}#

Maps standard RNA bases to their canonical base pairings.

graphein.rna.constants.PSEUDOKNOT_CLOSING_SYMBOLS: List[str] = [']', '}', '>']#

List of symbols denoting a pseudoknot closing.

graphein.rna.constants.PSEUDOKNOT_OPENING_SYMBOLS: List[str] = ['[', '{', '<']#

List of symbols denoting a pseudoknot opening.

graphein.rna.constants.RNA_BASES: List[str] = ['A', 'U', 'G', 'C', 'I']#

List of allowable RNA Bases.

graphein.rna.constants.RNA_BASE_COLORS: Dict[str, str] = {'A': 'r', 'C': 'y', 'G': 'g', 'I': 'm', 'U': 'b'}#

Maps RNA bases (RNA_BASES) to a colour for visualisations.

graphein.rna.constants.SIMPLE_DOTBRACKET_NOTATION: List[str] = ['(', '.', ')']#

List of characters in simplest dotbracket notation.

graphein.rna.constants.SS_BOND_TYPES: List[str] = ['phosphodiester_bond', 'base_pairing', 'pseudoknot']#

List of valid secondary structure bond types.

graphein.rna.constants.SUPPORTED_DOTBRACKET_NOTATION = ['(', '.', ')', '[', '{', '<', ']', '}', '>']#

List of all valid dotbracket symbols. Amalgamation of SIMPLE_DOTBRACKET_NOTATION and SUPPORTED_PSEUDOKNOT_NOTATION.

graphein.rna.constants.SUPPORTED_PSEUDOKNOT_NOTATION: List[str] = ['[', '{', '<', ']', '}', '>']#

List of characters denoting pseudoknots in dotbracket notation. Amalgam of PSEUDOKNOT_OPENING_SYMBOLS and PSEUDOKNOT_CLOSING_SYMBOLS.

graphein.rna.constants.VALID_BASE_PAIRINGS = {'A': ['U', 'I'], 'C': ['G', 'I'], 'G': ['C', 'U'], 'I': ['A', 'C', 'U'], 'U': ['A', 'G', 'I']}#

Mapping of RNA bases (RNA_BASES) to their allowable pairings. Amalgam of CANONICAL_BASE_PAIRINGS and WOBBLE_BASE_PAIRINGS.

graphein.rna.constants.WOBBLE_BASE_PAIRINGS: Dict[str, List[str]] = {'A': ['I'], 'C': ['I'], 'G': ['U'], 'I': ['A', 'C', 'U'], 'U': ['G', 'I']}#

Maps RNA bases (RNA_BASES) to their wobble base pairings.