👑 hrmb package

hmrb.core module

class hmrb.core.Core(callbacks: Optional[Dict] = None, sets: Optional[Dict] = None, sort_length: bool = False)

Bases: object

Class handling the main functions surrounding the rule engine

Parameters

callbacks (dict) – dictionary of callback functions to execute following a successfull call.
sort_length (bool) – sort match results according to span length in ascending order (affects callback execution as well.)

Public methods:: load : add list of rules to engine __call__ : match list of input dicts with internal rules

_execute(responses: Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]], input_: Any) → None

_load(rules: List[List[Dict]], vars: List[List[Dict]]) → None

Adds list of rules to the engine

Implementation: passes rules to the root BaseNode of the class: sequentially

Parameters

rules (list) – list of rules to add to root node
vars (list) – list of shared varHandle objects to use

_match(spans: List[Tuple[int, list]]) → Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]]

Takes a list of spans and executes matching by passing each to the root node.

Parameters: spans (list) – list of spans to match
Returns: list of tuples containing match results
Return type: (list)

static default_callback(input_: list, span: slice, data: Dict) → None

load(inputs: str) → None

Adds rules to the engine.

Parameters: inputs (list) – list of rules in dialect

class hmrb.core.SpacyCore(callbacks: typing.Optional[typing.Dict] = None, sets: typing.Optional[typing.Dict] = None, map_doc: typing.Callable = <function _default_map>, sort_length: bool = False)

Bases: hmrb.core.Core

Class wrapping the Core object into a spaCy component.

Parameters

callbacks (dict) – dictionary of callback functions to execute following a successfull call.
sort_length (bool) – sort match results according to span length in ascending order (affects callback execution as well.)

Public methods:: load : add list of rules in the engine __call__ : match a spaCy Document or Span against the rule set

name = 'hmrb'

hmrb.core._default_map(doc: Any) → Any

hmrb.node module

class hmrb.node.BaseNode(data: Optional[Dict] = None)

Bases: object

Class for handling nodes

BaseNodes is an atomic element of our data structure. Each token is handled by a separate BaseNode (or one of its subclasses). The BaseNode is designed to build itself in a recursive manner through the consume method from a list of dict rules. It handles the matching of a list of incoming tokens through the BaseNode call method.

Parameters: data (dict) – data associated with the node (optional: None)

Public methods:: consume : handles the building of the data structure from a rule __call__ : handles the matching of incoming data

_build_child(child_key: Tuple[frozenset, int], child: hmrb.node.BaseNode) → None

Adds new child to the children of BaseNode. Updates call order and attribute index with the new child.

Parameters

child_key (frozenset) – a hashable identifier for new child
child (BaseNode) – new child BaseNode object

_consume_child(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) → None

_consume_regex(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) → None

_consume_set(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) → None

_consume_var(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) → None

static _make_node_key(token: Dict) → Tuple[frozenset, int]

Creates a hashable dictionary key from a dict token

Parameters: token (dict) – a token dictionary

Returns: (frozenset, int) created from the list of (key, value) tuples: (sorted by default) and hash of data items

_match(token: Dict) → set

(private) Handles the matching of a single token dictionary with the current nodes children.

Implementation: TODO:

Notes: In the case of missing attribute att_name, att_value is None: and all children with att_name are removed from the matches

Parameters: token (dict) – dict of (attribute, values) of a single token

consume(rule: List[Dict], vars: Dict, sets: Dict) → None

Builds internal representation from list of rules

Implementation: Recursively handles the construction of the internal: tree structure. Passes token to the appropriate BaseNode class/subclass for handling. If an equivalent node already exists, the remaining tokens of the rules are passed to that node. If no such node exists a new node is added to the children of the current node.

Parameters

rule (List(dict)) – list of token rule token dictionaries
vars (dict) – dict of all varHandle objects created

static get_att(token: Any, att_name: str) → Any

Retrieves the value of a token attribute regardless of whether it is a dictionary or a normal object.

Parameters

token (Any) – : target token
att_name (str) – : attribute name

Returns

Value of the target attribute

Return type

response (Any)

optimise_call_order() → None

class hmrb.node.FrozenMap(*args: Any, **kwargs: Any)

Bases: collections.abc.Mapping

based on https://github.com/pcattori/maps Creates a hashable from any object using frozensets

_abc_impl = <_abc_data object>

classmethod recurse(obj: Any) → Any

class hmrb.node.RegexNode(token: Dict): Bases: hmrb.node.BaseNode

class hmrb.node.SetNode(rule_set: Dict, data: Dict)

Bases: hmrb.node.BaseNode

Class for Set nodes

SetNode is a subclass of BaseNode designed to efficiently handling the matching of sets.

Parameters

rule_set (dict) – global dictionary of sets to check
data (dict) – data object that is returned if the SetNode is matched.

Public methods:

__call__handles the matching of incoming list of tokens by: checking if the token is present in the rule_set.

class hmrb.node.StarNode(data: Optional[Dict] = None): Bases: hmrb.node.BaseNode

hmrb.node._recurse(obj: Any, map_fn: Callable) → Any: based on https://github.com/pcattori/maps Handles recursion within FrozenMap

hmrb.node.make_key(obj: Any) → int

Parameters: obj (any) – any type of nested / unnested object

Returns: (int) created from the hash of the FrozenMap object

Notes: Python’s hash() is inconsistent across processes/runs.

class hmrb.node.varNode(var_handle: hmrb.node.BaseNode, data: Dict, min_length: int, min_run: int, max_run: int)

Bases: hmrb.node.BaseNode

Class for var nodes

varNode is a subclass of BaseNode designed to efficiently handling the reuse of the same node structure (macros). The varNode wraps around a BaseNode object (varHandle) to support shared objects and the logical repitition of executions. The remaining parts are passed to its super BaseNode consume. In this way, we have a clear distinction between repeated/seperated section and sections that follow the repeated parts.

Matching is done in a similar two step process. First, the incoming pattern is passed to the varHandle structure var_handle returning depths of successfull matches. Depending on parameters of the varNode, it tries to match the varHandle multiple times. The remaining unmatched tokens are passed to the children “outer” structure for matching. In case, min_run is 0 it also passes the original input to the “outer” structure.

Parameters

var_handle (BaseNode) – shared BaseNode object that becomes the “inner” structure of the varNode
data (dict) – data object that is returned if the varNode is matched.
min_length (int) – precomputed minimum length of the inner structure. Used to determine if enough input tokens are left to do another loop.
min_run (int) – minimum runs of the inner structure. If set to 0 the inner structure is optional (default 1).
max_run (int) – maximum runs of the inner structure (default 1).

Public methods:

__call__handles the matching of incoming list of tokens by: first recursing through the shared inner varHandle and then by recursing the remaining unmatched tokens through the super BaseNode object (“outer”).

hmrb.lang module

class hmrb.lang.Block(members: List, vars: Dict, neg: bool, min_: int, max_: int, label: Optional[str], union: bool = False, is_body: bool = False)

Bases: object

Represents a rule block that may be the body or part of the body of a Law or a Var.

Parameters

members (members [dict] -- Block) –
negated (neg [bool] --) –
matches (max [int] -- maximum number of) –
matches –

_add_var(children: list, length: int, min_: int, max_: int) → Any

_parse_block(block: hmrb.lang.Block) → None

_parse_labeled_element(label: str, parent: list) → None

_parse_ref(ref: hmrb.lang.Ref) → None

_parse_unit(unit: hmrb.lang.Unit) → None

_sequence_extend(block: hmrb.lang.Block) → None

_union_extend(block: hmrb.lang.Block) → None

parse() → None

class hmrb.lang.BlockIterator(block_str: str, inf: int = 10000000000, start: int = 1)

Bases: object

Provides an iterator that iterates over the top-level block segments. These segments could be blocks (see also Block), units (see also Unit), and variable references (see also Ref). These blocks are validated but parsed later on. This iterator will produce tuples of the following shape: (content_string, negated, min_match_num, max_match_num)

Parameters

string (block_str [str] -- block) –
value (inf [int] -- infinity) –
line (start [int] -- start) –

_check_body_level() → None

_close_bracket(ch: str) → None

Handles a closing bracket.

Level 0: checks for an open variable reference and closes the block Level 1: checks type of segment (block/unit) and adds to iterable Other levels: adds character to the buffer

Parameters: ch – – closing bracket character

_consume(block: str) → None

Consumes a block string a character at a time. Note that escaped characters are treated differently through the character iterator. The iterator acts on all brackets but it validates only variable references and operators from level 1 (the top content level).

Parameters: string (block [str] -- block) –

_open_bracket(ch: str) → None

Handles opening bracket.

Level 0: checks for no operators and opens the block Level 1: parses operator into operator buffer and adds char to buffer Other levels: add to buffer

Parameters: ch – – open bracket char

_parse_label() → None

_parse_operator() → Tuple

Assumes that there is an operator in the buffer and parses it. Spaces are important for all operators. They are matched as show below. Number placeholders can be replaced with any valid integer.

Examples

not
optional
zero or more
one or more
at least {number}
at most {number}
{number} to {number}

Raises: ValueError -- when buffer doesn't contain a valid operator and is – not empty
Returns: [Tuple] – negated[bool], min # matches, max # matches

_parse_var() → None: Parses a variable reference name and adds it to the iterable.

property is_union: bool

class hmrb.lang.Grammar(string: str, vars_: Dict)

Bases: object

Represents a Babylonian grammar. It may consist of Var and Law segments.

Parameters: parsed (string [str] -- grammar string to be) –

_build(string: str) → None

_deploy() → None

_map_segments(type_: Any) → Dict

Collects all segments of particular type and creates a mapping between their names and the objects themselves.

Returns: [dict] – mapping between variable names and segments

static _parse_segment_type(line: str) → Optional[hmrb.lang.Types]

Determines the type of grammar segment: Law or Var.

Parameters: line – – segment lines
Returns: [Types] – segment type

_segment(string: str) → Generator

Segments the grammar into laws (Law) and variables (Var). Yields the type of the segment as well as all the lines.

Parameters: string – – string representation of the segment

static end_var(parent_end: Any) → None

parser_map = {<Types.VAR: 'var'>: <class 'hmrb.lang.Var'>, <Types.LAW: 'law'>: <class 'hmrb.lang.Law'>}

class hmrb.lang.Law(lines: List, vars: Dict)

Bases: object

Represents a rule segment of a Babylonian grammar. It consists of an optional name, a list of attributes, and a compulsory body.

Parameters: lines (lines [list] -- segment) –

_parse(lines: List) → None

static _parse_atts(lines: List) → Dict

static _parse_name(first_line: str, start: int) → str

static _segment_lines(lines: List) → Tuple[List, List]

class hmrb.lang.Ref(ref: str, neg: bool, min_: int, max_: int, label: Optional[str])

Bases: object

Represents a reference to a variable.

Parameters

reference (ref [str] -- variable) –
negated (neg [bool] --) –
matches (max [int] -- maximum number of) –
matches –
label (label [str] -- reference) –

class hmrb.lang.Types(value)

Bases: enum.Enum

Types of segments and items

BLOCK = 'block'

LAW = 'law'

UNIT = 'unit'

VAR = 'var'

VAR_REF = 'var_ref'

class hmrb.lang.Unit(atts: Dict, neg: bool, min_: int, max_: int, label: Optional[str])

Bases: object

Represents a group of attribute constraints that form a rule unit. Units are typically members of a block.

Parameters

attributes (atts [dict] -- unit) –
negated (neg [bool] --) –
matches (max [int] -- maximum number of) –
matches –
label (label [str] -- unit) –

class hmrb.lang.Var(lines: List, vars: Dict)

Bases: object

Represents a named rule variable segment of a Babylonian grammar.

Parameters: lines (lines [list] -- segment) –

_parse(lines: List) → None

static _parse_name(first_line: str, start: int) → str

hmrb.lang.char_iter(string: str) → Generator

Iterate over the characters of a string while preserving escaped chars. The point is to allow escaped characters to be treated differently during char iteration (parsing) and then unescaped inside the final data structure.

Parameters

string (string [str] -- regex) –

Returns

[Generator] – generator iterating over the characters of the: string

hmrb.lang.parse_block(string: str, vars: Dict, neg: bool = False, min_: int = 1, max_: int = 1, label: Optional[str] = None, start: int = - 1) → hmrb.lang.Block

Parses a block string into a Block object. Takes quantifiers and negation modifier parameters. Recursively calls itself or parse_unit to parse nested blocks and units.

Parameters

string (string [str] -- block) –
negated (neg [bool] -- True if) –
matches (max [int] -- maximum number of) –
matches –
label (label [str] -- block) –
number (start [int] -- block start line) –

Returns

[Block] – Block object representing the parsed string

hmrb.lang.parse_unit(string: str, neg: bool = False, min_: int = 1, max_: int = 1, label: Optional[str] = None) → hmrb.lang.Unit

Parses a unit string into a Unit object. Unit is a list of key-value pairs inside a pair of brackets. Key-value pairs are separated by a comma. There are colons between keys and values. Values are set inside double quotes, while keys are alphanumeric var-like names. Hyphens and underscores are allowed in the key names, but numbers and hyphens are not allowed in the beginning. An arbitrary amount of space separators is allowed between each of the components of the Unit (key, value, colon and comma).

Examples

(att_name: “attribute value”, att-name2: “attribute value”)
(att_name:”attribute value”,att-name2:”attribute value”)

Parameters

string (string [str] -- unit) –
negated (neg [bool] -- True if) –
matches (max [int] -- maximum number of) –
matches –

Returns

[Unit] – Unit object representing the parsed string

hmrb.lang.parse_value(string: str) → Union[str, dict, bool, int, float]

Unescapes a Unit attribute value and determines whether it is a regular string or a Regex.

Parameters: value (string [str] -- attribute) –
Returns: [Union[str, Regex, bool, int, float]] – parsed value

hmrb.lang.unescape(string: str) → str

Unescaping escaped characters typically inside attribute values.

Parameters: unescaped (string [str] -- string to be) –
Returns: [str] – unescaped string

hmrb.lang.unique(sequence: list) → Iterator

hmrb.protobuffer module

class hmrb.protobuffer.Labels(labels: set, depth: int, length: int = 1)

Bases: object

Class wrapper handling Labels message protobuffer

Class for creating, holding and merging Labels type protobuffer messages with other defined types of messages. Initialization of the class creates a new Labels message. Addition of new Labels is handled through the += (__iadd__) magic method.

Protobuffer definition (proto3):

message Labels {: map<string, Span> items = 1; }
message Span {: string start = 1; string end = 2; }

Parameters

labels (list) –
depth (int) – of the Match span)
length (int) – (defaults to 1)

Public methods:: += – handles the addition of a new protobuffer to the object get_depth – returns the maximum depth reached

Notes

span start and end integers are stored as strings, since protobuffers are not able to distinguish between set 0 and unset (default) 0. See Google’s protobuffer documentation on default values.

class hmrb.protobuffer.Match(attributes: Dict, depth: int)

Bases: object

Class wrapper handling Match message protobuffers

Class for creating, holding and merging Match type protobuffer messages with other defined types of messages. Initialization of the class creates a new Match message. The Match is considered Active if it contains any valid attributes. Inactive Matches are later ignored in merging objects. An inactive Match with depth_reached transfers its depth_reached to the new object. Addition of data is handled through the += (__iadd__) magic method.

Protobuffer definition (proto3):

message Match {: Span span = 1; map<string, string> attributes = 2; map<string, google.protobuf.Any> underscore = 3; }
message Span {: string start = 1; string end = 2; }

Parameters

attributes (dict) – (except reserved attributes that are added to underscore)
depth (int) – of the Match span)

Public methods:: += – handles the addition of a new protobuffer to the object set_depth – sets depth reached get_depth – returns the maximum depth reached

Notes

span start and end integers are stored as strings, since protobuffers are not able to distinguish between set 0 and unset (default) 0. See Google’s protobuffer documentation on default values.

get_depth() → int

set_depth(depth: int) → None

set_start(start: int) → None

class hmrb.protobuffer.Responses

Bases: object

Class wrapper handling Responses message protobuffers

Class for creating, holding and merging Response type protobuffer messages with other defined types of messages (see response.proto for protocol buffer definitions). Initializing the class creates an empty Responses protobuffer. Addition of data is handled through the += (__iadd__) magic method.

Protobuffer definition (proto3):

message Responses {: repeated Match items = 1; }

Public methods:

+= – handles the addition of a new protobuffer to the object set_start – sets the start of all (not set) span messages get_depth – returns the maximum depth reached

format(sort_length: bool = False) → Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]]

get_depth() → int

set_depth(depth: int) → None

set_start(start: int) → None

hmrb.protobuffer.mirror_depth(left: Union[hmrb.protobuffer.Match, hmrb.protobuffer.Responses], right: Union[hmrb.protobuffer.Match, hmrb.protobuffer.Responses]) → None

hmrb.protobuffer.mirror_labels(left: Any, right: Any) → None