πŸ‘‘ hrmb package

hmrb.core module

class hmrb.core.Core(callbacks: Optional[Dict] = None, sets: Optional[Dict] = None, sort_length: bool = False)

Bases: object

Class handling the main functions surrounding the rule engine

Parameters
  • callbacks (dict) – dictionary of callback functions to execute following a successfull call.

  • sort_length (bool) – sort match results according to span length in ascending order (affects callback execution as well.)

Public methods:

load : add list of rules to engine __call__ : match list of input dicts with internal rules

_execute(responses: Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]], input_: Any) None
_load(rules: List[List[Dict]], vars: List[List[Dict]]) None

Adds list of rules to the engine

Implementation: passes rules to the root BaseNode of the class

sequentially

Parameters
  • rules (list) – list of rules to add to root node

  • vars (list) – list of shared varHandle objects to use

_match(spans: List[Tuple[int, list]]) Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]]

Takes a list of spans and executes matching by passing each to the root node.

Parameters

spans (list) – list of spans to match

Returns

list of tuples containing match results

Return type

(list)

static default_callback(input_: list, span: slice, data: Dict) None
load(inputs: str) None

Adds rules to the engine.

Parameters

inputs (list) – list of rules in dialect

class hmrb.core.SpacyCore(callbacks: typing.Optional[typing.Dict] = None, sets: typing.Optional[typing.Dict] = None, map_doc: typing.Callable = <function _default_map>, sort_length: bool = False)

Bases: hmrb.core.Core

Class wrapping the Core object into a spaCy component.

Parameters
  • callbacks (dict) – dictionary of callback functions to execute following a successfull call.

  • sort_length (bool) – sort match results according to span length in ascending order (affects callback execution as well.)

Public methods:

load : add list of rules in the engine __call__ : match a spaCy Document or Span against the rule set

name = 'hmrb'
hmrb.core._default_map(doc: Any) Any

hmrb.node module

class hmrb.node.BaseNode(data: Optional[Dict] = None)

Bases: object

Class for handling nodes

BaseNodes is an atomic element of our data structure. Each token is handled by a separate BaseNode (or one of its subclasses). The BaseNode is designed to build itself in a recursive manner through the consume method from a list of dict rules. It handles the matching of a list of incoming tokens through the BaseNode call method.

Parameters

data (dict) – data associated with the node (optional: None)

Public methods:

consume : handles the building of the data structure from a rule __call__ : handles the matching of incoming data

_build_child(child_key: Tuple[frozenset, int], child: hmrb.node.BaseNode) None

Adds new child to the children of BaseNode. Updates call order and attribute index with the new child.

Parameters
  • child_key (frozenset) – a hashable identifier for new child

  • child (BaseNode) – new child BaseNode object

_consume_child(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None
_consume_regex(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None
_consume_set(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None
_consume_var(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None
static _make_node_key(token: Dict) Tuple[frozenset, int]

Creates a hashable dictionary key from a dict token

Parameters

token (dict) – a token dictionary

Returns: (frozenset, int) created from the list of (key, value) tuples

(sorted by default) and hash of data items

_match(token: Dict) set

(private) Handles the matching of a single token dictionary with the current nodes children.

Implementation: TODO:

Notes: In the case of missing attribute att_name, att_value is None

and all children with att_name are removed from the matches

Parameters

token (dict) – dict of (attribute, values) of a single token

consume(rule: List[Dict], vars: Dict, sets: Dict) None

Builds internal representation from list of rules

Implementation: Recursively handles the construction of the internal

tree structure. Passes token to the appropriate BaseNode class/subclass for handling. If an equivalent node already exists, the remaining tokens of the rules are passed to that node. If no such node exists a new node is added to the children of the current node.

Parameters
  • rule (List(dict)) – list of token rule token dictionaries

  • vars (dict) – dict of all varHandle objects created

static get_att(token: Any, att_name: str) Any

Retrieves the value of a token attribute regardless of whether it is a dictionary or a normal object.

Parameters
  • token (Any) – : target token

  • att_name (str) – : attribute name

Returns

Value of the target attribute

Return type

response (Any)

optimise_call_order() None
class hmrb.node.FrozenMap(*args: Any, **kwargs: Any)

Bases: collections.abc.Mapping

based on https://github.com/pcattori/maps Creates a hashable from any object using frozensets

_abc_impl = <_abc_data object>
classmethod recurse(obj: Any) Any
class hmrb.node.RegexNode(token: Dict)

Bases: hmrb.node.BaseNode

class hmrb.node.SetNode(rule_set: Dict, data: Dict)

Bases: hmrb.node.BaseNode

Class for Set nodes

SetNode is a subclass of BaseNode designed to efficiently handling the matching of sets.

Parameters
  • rule_set (dict) – global dictionary of sets to check

  • data (dict) – data object that is returned if the SetNode is matched.

Public methods:
__call__handles the matching of incoming list of tokens by

checking if the token is present in the rule_set.

class hmrb.node.StarNode(data: Optional[Dict] = None)

Bases: hmrb.node.BaseNode

hmrb.node._recurse(obj: Any, map_fn: Callable) Any

based on https://github.com/pcattori/maps Handles recursion within FrozenMap

hmrb.node.make_key(obj: Any) int
Parameters

obj (any) – any type of nested / unnested object

Returns: (int) created from the hash of the FrozenMap object

Notes: Python’s hash() is inconsistent across processes/runs.

class hmrb.node.varNode(var_handle: hmrb.node.BaseNode, data: Dict, min_length: int, min_run: int, max_run: int)

Bases: hmrb.node.BaseNode

Class for var nodes

varNode is a subclass of BaseNode designed to efficiently handling the reuse of the same node structure (macros). The varNode wraps around a BaseNode object (varHandle) to support shared objects and the logical repitition of executions. The remaining parts are passed to its super BaseNode consume. In this way, we have a clear distinction between repeated/seperated section and sections that follow the repeated parts.

Matching is done in a similar two step process. First, the incoming pattern is passed to the varHandle structure var_handle returning depths of successfull matches. Depending on parameters of the varNode, it tries to match the varHandle multiple times. The remaining unmatched tokens are passed to the children β€œouter” structure for matching. In case, min_run is 0 it also passes the original input to the β€œouter” structure.

Parameters
  • var_handle (BaseNode) – shared BaseNode object that becomes the β€œinner” structure of the varNode

  • data (dict) – data object that is returned if the varNode is matched.

  • min_length (int) – precomputed minimum length of the inner structure. Used to determine if enough input tokens are left to do another loop.

  • min_run (int) – minimum runs of the inner structure. If set to 0 the inner structure is optional (default 1).

  • max_run (int) – maximum runs of the inner structure (default 1).

Public methods:
__call__handles the matching of incoming list of tokens by

first recursing through the shared inner varHandle and then by recursing the remaining unmatched tokens through the super BaseNode object (β€œouter”).

hmrb.lang module

class hmrb.lang.Block(members: List, vars: Dict, neg: bool, min_: int, max_: int, label: Optional[str], union: bool = False, is_body: bool = False)

Bases: object

Represents a rule block that may be the body or part of the body of a Law or a Var.

Parameters
  • members (members [dict] -- Block) –

  • negated (neg [bool] --) –

  • matches (max [int] -- maximum number of) –

  • matches –

_add_var(children: list, length: int, min_: int, max_: int) Any
_parse_block(block: hmrb.lang.Block) None
_parse_labeled_element(label: str, parent: list) None
_parse_ref(ref: hmrb.lang.Ref) None
_parse_unit(unit: hmrb.lang.Unit) None
_sequence_extend(block: hmrb.lang.Block) None
_union_extend(block: hmrb.lang.Block) None
parse() None
class hmrb.lang.BlockIterator(block_str: str, inf: int = 10000000000, start: int = 1)

Bases: object

Provides an iterator that iterates over the top-level block segments. These segments could be blocks (see also Block), units (see also Unit), and variable references (see also Ref). These blocks are validated but parsed later on. This iterator will produce tuples of the following shape: (content_string, negated, min_match_num, max_match_num)

Parameters
  • string (block_str [str] -- block) –

  • value (inf [int] -- infinity) –

  • line (start [int] -- start) –

_check_body_level() None
_close_bracket(ch: str) None

Handles a closing bracket.

Level 0: checks for an open variable reference and closes the block Level 1: checks type of segment (block/unit) and adds to iterable Other levels: adds character to the buffer

Parameters

ch – – closing bracket character

_consume(block: str) None

Consumes a block string a character at a time. Note that escaped characters are treated differently through the character iterator. The iterator acts on all brackets but it validates only variable references and operators from level 1 (the top content level).

Parameters

string (block [str] -- block) –

_open_bracket(ch: str) None

Handles opening bracket.

Level 0: checks for no operators and opens the block Level 1: parses operator into operator buffer and adds char to buffer Other levels: add to buffer

Parameters

ch – – open bracket char

_parse_label() None
_parse_operator() Tuple

Assumes that there is an operator in the buffer and parses it. Spaces are important for all operators. They are matched as show below. Number placeholders can be replaced with any valid integer.

Examples

  • not

  • optional

  • zero or more

  • one or more

  • at least {number}

  • at most {number}

  • {number} to {number}

Raises

ValueError -- when buffer doesn't contain a valid operator and is – not empty

Returns

[Tuple] – negated[bool], min # matches, max # matches

_parse_var() None

Parses a variable reference name and adds it to the iterable.

property is_union: bool
class hmrb.lang.Grammar(string: str, vars_: Dict)

Bases: object

Represents a Babylonian grammar. It may consist of Var and Law segments.

Parameters

parsed (string [str] -- grammar string to be) –

_build(string: str) None
_deploy() None
_map_segments(type_: Any) Dict

Collects all segments of particular type and creates a mapping between their names and the objects themselves.

Returns

[dict] – mapping between variable names and segments

static _parse_segment_type(line: str) Optional[hmrb.lang.Types]

Determines the type of grammar segment: Law or Var.

Parameters

line – – segment lines

Returns

[Types] – segment type

_segment(string: str) Generator

Segments the grammar into laws (Law) and variables (Var). Yields the type of the segment as well as all the lines.

Parameters

string – – string representation of the segment

static end_var(parent_end: Any) None
parser_map = {<Types.VAR: 'var'>: <class 'hmrb.lang.Var'>, <Types.LAW: 'law'>: <class 'hmrb.lang.Law'>}
class hmrb.lang.Law(lines: List, vars: Dict)

Bases: object

Represents a rule segment of a Babylonian grammar. It consists of an optional name, a list of attributes, and a compulsory body.

Parameters

lines (lines [list] -- segment) –

_parse(lines: List) None
static _parse_atts(lines: List) Dict
static _parse_name(first_line: str, start: int) str
static _segment_lines(lines: List) Tuple[List, List]
class hmrb.lang.Ref(ref: str, neg: bool, min_: int, max_: int, label: Optional[str])

Bases: object

Represents a reference to a variable.

Parameters
  • reference (ref [str] -- variable) –

  • negated (neg [bool] --) –

  • matches (max [int] -- maximum number of) –

  • matches –

  • label (label [str] -- reference) –

class hmrb.lang.Types(value)

Bases: enum.Enum

Types of segments and items

BLOCK = 'block'
LAW = 'law'
UNIT = 'unit'
VAR = 'var'
VAR_REF = 'var_ref'
class hmrb.lang.Unit(atts: Dict, neg: bool, min_: int, max_: int, label: Optional[str])

Bases: object

Represents a group of attribute constraints that form a rule unit. Units are typically members of a block.

Parameters
  • attributes (atts [dict] -- unit) –

  • negated (neg [bool] --) –

  • matches (max [int] -- maximum number of) –

  • matches –

  • label (label [str] -- unit) –

class hmrb.lang.Var(lines: List, vars: Dict)

Bases: object

Represents a named rule variable segment of a Babylonian grammar.

Parameters

lines (lines [list] -- segment) –

_parse(lines: List) None
static _parse_name(first_line: str, start: int) str
hmrb.lang.char_iter(string: str) Generator

Iterate over the characters of a string while preserving escaped chars. The point is to allow escaped characters to be treated differently during char iteration (parsing) and then unescaped inside the final data structure.

Parameters

string (string [str] -- regex) –

Returns

[Generator] – generator iterating over the characters of the

string

hmrb.lang.parse_block(string: str, vars: Dict, neg: bool = False, min_: int = 1, max_: int = 1, label: Optional[str] = None, start: int = - 1) hmrb.lang.Block

Parses a block string into a Block object. Takes quantifiers and negation modifier parameters. Recursively calls itself or parse_unit to parse nested blocks and units.

Parameters
  • string (string [str] -- block) –

  • negated (neg [bool] -- True if) –

  • matches (max [int] -- maximum number of) –

  • matches –

  • label (label [str] -- block) –

  • number (start [int] -- block start line) –

Returns

[Block] – Block object representing the parsed string

hmrb.lang.parse_unit(string: str, neg: bool = False, min_: int = 1, max_: int = 1, label: Optional[str] = None) hmrb.lang.Unit

Parses a unit string into a Unit object. Unit is a list of key-value pairs inside a pair of brackets. Key-value pairs are separated by a comma. There are colons between keys and values. Values are set inside double quotes, while keys are alphanumeric var-like names. Hyphens and underscores are allowed in the key names, but numbers and hyphens are not allowed in the beginning. An arbitrary amount of space separators is allowed between each of the components of the Unit (key, value, colon and comma).

Examples

  • (att_name: β€œattribute value”, att-name2: β€œattribute value”)

  • (att_name:”attribute value”,att-name2:”attribute value”)

Parameters
  • string (string [str] -- unit) –

  • negated (neg [bool] -- True if) –

  • matches (max [int] -- maximum number of) –

  • matches –

Returns

[Unit] – Unit object representing the parsed string

hmrb.lang.parse_value(string: str) Union[str, dict, bool, int, float]

Unescapes a Unit attribute value and determines whether it is a regular string or a Regex.

Parameters

value (string [str] -- attribute) –

Returns

[Union[str, Regex, bool, int, float]] – parsed value

hmrb.lang.unescape(string: str) str

Unescaping escaped characters typically inside attribute values.

Parameters

unescaped (string [str] -- string to be) –

Returns

[str] – unescaped string

hmrb.lang.unique(sequence: list) Iterator

hmrb.protobuffer module

class hmrb.protobuffer.Labels(labels: set, depth: int, length: int = 1)

Bases: object

Class wrapper handling Labels message protobuffer

Class for creating, holding and merging Labels type protobuffer messages with other defined types of messages. Initialization of the class creates a new Labels message. Addition of new Labels is handled through the += (__iadd__) magic method.

Protobuffer definition (proto3):
message Labels {

map<string, Span> items = 1; }

message Span {

string start = 1; string end = 2; }

Parameters
  • labels (list) –

  • depth (int) – of the Match span)

  • length (int) – (defaults to 1)

Public methods:

+= – handles the addition of a new protobuffer to the object get_depth – returns the maximum depth reached

Notes

span start and end integers are stored as strings, since protobuffers are not able to distinguish between set 0 and unset (default) 0. See Google’s protobuffer documentation on default values.

class hmrb.protobuffer.Match(attributes: Dict, depth: int)

Bases: object

Class wrapper handling Match message protobuffers

Class for creating, holding and merging Match type protobuffer messages with other defined types of messages. Initialization of the class creates a new Match message. The Match is considered Active if it contains any valid attributes. Inactive Matches are later ignored in merging objects. An inactive Match with depth_reached transfers its depth_reached to the new object. Addition of data is handled through the += (__iadd__) magic method.

Protobuffer definition (proto3):
message Match {

Span span = 1; map<string, string> attributes = 2; map<string, google.protobuf.Any> underscore = 3; }

message Span {

string start = 1; string end = 2; }

Parameters
  • attributes (dict) – (except reserved attributes that are added to underscore)

  • depth (int) – of the Match span)

Public methods:

+= – handles the addition of a new protobuffer to the object set_depth – sets depth reached get_depth – returns the maximum depth reached

Notes

span start and end integers are stored as strings, since protobuffers are not able to distinguish between set 0 and unset (default) 0. See Google’s protobuffer documentation on default values.

get_depth() int
set_depth(depth: int) None
set_start(start: int) None
class hmrb.protobuffer.Responses

Bases: object

Class wrapper handling Responses message protobuffers

Class for creating, holding and merging Response type protobuffer messages with other defined types of messages (see response.proto for protocol buffer definitions). Initializing the class creates an empty Responses protobuffer. Addition of data is handled through the += (__iadd__) magic method.

Protobuffer definition (proto3):
message Responses {

repeated Match items = 1; }

Public methods:

+= – handles the addition of a new protobuffer to the object set_start – sets the start of all (not set) span messages get_depth – returns the maximum depth reached

format(sort_length: bool = False) Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]]
get_depth() int
set_depth(depth: int) None
set_start(start: int) None
hmrb.protobuffer.mirror_depth(left: Union[hmrb.protobuffer.Match, hmrb.protobuffer.Responses], right: Union[hmrb.protobuffer.Match, hmrb.protobuffer.Responses]) None
hmrb.protobuffer.mirror_labels(left: Any, right: Any) None