π hrmb packageο
hmrb.core moduleο
- class hmrb.core.Core(callbacks: Optional[Dict] = None, sets: Optional[Dict] = None, sort_length: bool = False)ο
Bases:
object
Class handling the main functions surrounding the rule engine
- Parameters
callbacks (dict) β dictionary of callback functions to execute following a successfull call.
sort_length (bool) β sort match results according to span length in ascending order (affects callback execution as well.)
- Public methods:
load : add list of rules to engine __call__ : match list of input dicts with internal rules
- _execute(responses: Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]], input_: Any) None ο
- _load(rules: List[List[Dict]], vars: List[List[Dict]]) None ο
Adds list of rules to the engine
- Implementation: passes rules to the root BaseNode of the class
sequentially
- Parameters
rules (list) β list of rules to add to root node
vars (list) β list of shared varHandle objects to use
- _match(spans: List[Tuple[int, list]]) Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]] ο
Takes a list of spans and executes matching by passing each to the root node.
- Parameters
spans (list) β list of spans to match
- Returns
list of tuples containing match results
- Return type
(list)
- static default_callback(input_: list, span: slice, data: Dict) None ο
- load(inputs: str) None ο
Adds rules to the engine.
- Parameters
inputs (list) β list of rules in dialect
- class hmrb.core.SpacyCore(callbacks: typing.Optional[typing.Dict] = None, sets: typing.Optional[typing.Dict] = None, map_doc: typing.Callable = <function _default_map>, sort_length: bool = False)ο
Bases:
hmrb.core.Core
Class wrapping the Core object into a spaCy component.
- Parameters
callbacks (dict) β dictionary of callback functions to execute following a successfull call.
sort_length (bool) β sort match results according to span length in ascending order (affects callback execution as well.)
- Public methods:
load : add list of rules in the engine __call__ : match a spaCy Document or Span against the rule set
- name = 'hmrb'ο
- hmrb.core._default_map(doc: Any) Any ο
hmrb.node moduleο
- class hmrb.node.BaseNode(data: Optional[Dict] = None)ο
Bases:
object
Class for handling nodes
BaseNodes is an atomic element of our data structure. Each token is handled by a separate BaseNode (or one of its subclasses). The BaseNode is designed to build itself in a recursive manner through the consume method from a list of dict rules. It handles the matching of a list of incoming tokens through the BaseNode call method.
- Parameters
data (dict) β data associated with the node (optional: None)
- Public methods:
consume : handles the building of the data structure from a rule __call__ : handles the matching of incoming data
- _build_child(child_key: Tuple[frozenset, int], child: hmrb.node.BaseNode) None ο
Adds new child to the children of BaseNode. Updates call order and attribute index with the new child.
- Parameters
child_key (frozenset) β a hashable identifier for new child
child (BaseNode) β new child BaseNode object
- _consume_child(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None ο
- _consume_regex(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None ο
- _consume_set(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None ο
- _consume_var(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None ο
- static _make_node_key(token: Dict) Tuple[frozenset, int] ο
Creates a hashable dictionary key from a dict token
- Parameters
token (dict) β a token dictionary
- Returns: (frozenset, int) created from the list of (key, value) tuples
(sorted by default) and hash of data items
- _match(token: Dict) set ο
(private) Handles the matching of a single token dictionary with the current nodes children.
Implementation: TODO:
- Notes: In the case of missing attribute att_name, att_value is None
and all children with att_name are removed from the matches
- Parameters
token (dict) β dict of (attribute, values) of a single token
- consume(rule: List[Dict], vars: Dict, sets: Dict) None ο
Builds internal representation from list of rules
- Implementation: Recursively handles the construction of the internal
tree structure. Passes token to the appropriate BaseNode class/subclass for handling. If an equivalent node already exists, the remaining tokens of the rules are passed to that node. If no such node exists a new node is added to the children of the current node.
- Parameters
rule (List(dict)) β list of token rule token dictionaries
vars (dict) β dict of all varHandle objects created
- static get_att(token: Any, att_name: str) Any ο
Retrieves the value of a token attribute regardless of whether it is a dictionary or a normal object.
- Parameters
token (Any) β : target token
att_name (str) β : attribute name
- Returns
Value of the target attribute
- Return type
response (Any)
- optimise_call_order() None ο
- class hmrb.node.FrozenMap(*args: Any, **kwargs: Any)ο
Bases:
collections.abc.Mapping
based on https://github.com/pcattori/maps Creates a hashable from any object using frozensets
- _abc_impl = <_abc_data object>ο
- classmethod recurse(obj: Any) Any ο
- class hmrb.node.RegexNode(token: Dict)ο
Bases:
hmrb.node.BaseNode
- class hmrb.node.SetNode(rule_set: Dict, data: Dict)ο
Bases:
hmrb.node.BaseNode
Class for Set nodes
SetNode is a subclass of BaseNode designed to efficiently handling the matching of sets.
- Parameters
rule_set (dict) β global dictionary of sets to check
data (dict) β data object that is returned if the SetNode is matched.
- Public methods:
- __call__handles the matching of incoming list of tokens by
checking if the token is present in the rule_set.
- class hmrb.node.StarNode(data: Optional[Dict] = None)ο
Bases:
hmrb.node.BaseNode
- hmrb.node._recurse(obj: Any, map_fn: Callable) Any ο
based on https://github.com/pcattori/maps Handles recursion within FrozenMap
- hmrb.node.make_key(obj: Any) int ο
- Parameters
obj (any) β any type of nested / unnested object
Returns: (int) created from the hash of the FrozenMap object
Notes: Pythonβs hash() is inconsistent across processes/runs.
- class hmrb.node.varNode(var_handle: hmrb.node.BaseNode, data: Dict, min_length: int, min_run: int, max_run: int)ο
Bases:
hmrb.node.BaseNode
Class for var nodes
varNode is a subclass of BaseNode designed to efficiently handling the reuse of the same node structure (macros). The varNode wraps around a BaseNode object (varHandle) to support shared objects and the logical repitition of executions. The remaining parts are passed to its super BaseNode consume. In this way, we have a clear distinction between repeated/seperated section and sections that follow the repeated parts.
Matching is done in a similar two step process. First, the incoming pattern is passed to the varHandle structure var_handle returning depths of successfull matches. Depending on parameters of the varNode, it tries to match the varHandle multiple times. The remaining unmatched tokens are passed to the children βouterβ structure for matching. In case, min_run is 0 it also passes the original input to the βouterβ structure.
- Parameters
var_handle (BaseNode) β shared BaseNode object that becomes the βinnerβ structure of the varNode
data (dict) β data object that is returned if the varNode is matched.
min_length (int) β precomputed minimum length of the inner structure. Used to determine if enough input tokens are left to do another loop.
min_run (int) β minimum runs of the inner structure. If set to 0 the inner structure is optional (default 1).
max_run (int) β maximum runs of the inner structure (default 1).
- Public methods:
- __call__handles the matching of incoming list of tokens by
first recursing through the shared inner varHandle and then by recursing the remaining unmatched tokens through the super BaseNode object (βouterβ).
hmrb.lang moduleο
- class hmrb.lang.Block(members: List, vars: Dict, neg: bool, min_: int, max_: int, label: Optional[str], union: bool = False, is_body: bool = False)ο
Bases:
object
Represents a rule block that may be the body or part of the body of a Law or a Var.
- Parameters
members (members [dict] -- Block) β
negated (neg [bool] --) β
matches (max [int] -- maximum number of) β
matches β
- _add_var(children: list, length: int, min_: int, max_: int) Any ο
- _parse_block(block: hmrb.lang.Block) None ο
- _parse_labeled_element(label: str, parent: list) None ο
- _parse_ref(ref: hmrb.lang.Ref) None ο
- _parse_unit(unit: hmrb.lang.Unit) None ο
- _sequence_extend(block: hmrb.lang.Block) None ο
- _union_extend(block: hmrb.lang.Block) None ο
- parse() None ο
- class hmrb.lang.BlockIterator(block_str: str, inf: int = 10000000000, start: int = 1)ο
Bases:
object
Provides an iterator that iterates over the top-level block segments. These segments could be blocks (see also Block), units (see also Unit), and variable references (see also Ref). These blocks are validated but parsed later on. This iterator will produce tuples of the following shape: (content_string, negated, min_match_num, max_match_num)
- Parameters
string (block_str [str] -- block) β
value (inf [int] -- infinity) β
line (start [int] -- start) β
- _check_body_level() None ο
- _close_bracket(ch: str) None ο
Handles a closing bracket.
Level 0: checks for an open variable reference and closes the block Level 1: checks type of segment (block/unit) and adds to iterable Other levels: adds character to the buffer
- Parameters
ch β β closing bracket character
- _consume(block: str) None ο
Consumes a block string a character at a time. Note that escaped characters are treated differently through the character iterator. The iterator acts on all brackets but it validates only variable references and operators from level 1 (the top content level).
- Parameters
string (block [str] -- block) β
- _open_bracket(ch: str) None ο
Handles opening bracket.
Level 0: checks for no operators and opens the block Level 1: parses operator into operator buffer and adds char to buffer Other levels: add to buffer
- Parameters
ch β β open bracket char
- _parse_label() None ο
- _parse_operator() Tuple ο
Assumes that there is an operator in the buffer and parses it. Spaces are important for all operators. They are matched as show below. Number placeholders can be replaced with any valid integer.
Examples
not
optional
zero or more
one or more
at least {number}
at most {number}
{number} to {number}
- Raises
ValueError -- when buffer doesn't contain a valid operator and is β not empty
- Returns
[Tuple] β negated[bool], min # matches, max # matches
- _parse_var() None ο
Parses a variable reference name and adds it to the iterable.
- property is_union: boolο
- class hmrb.lang.Grammar(string: str, vars_: Dict)ο
Bases:
object
Represents a Babylonian grammar. It may consist of Var and Law segments.
- Parameters
parsed (string [str] -- grammar string to be) β
- _build(string: str) None ο
- _deploy() None ο
- _map_segments(type_: Any) Dict ο
Collects all segments of particular type and creates a mapping between their names and the objects themselves.
- Returns
[dict] β mapping between variable names and segments
- static _parse_segment_type(line: str) Optional[hmrb.lang.Types] ο
Determines the type of grammar segment: Law or Var.
- Parameters
line β β segment lines
- Returns
[Types] β segment type
- _segment(string: str) Generator ο
Segments the grammar into laws (Law) and variables (Var). Yields the type of the segment as well as all the lines.
- Parameters
string β β string representation of the segment
- static end_var(parent_end: Any) None ο
- parser_map = {<Types.VAR: 'var'>: <class 'hmrb.lang.Var'>, <Types.LAW: 'law'>: <class 'hmrb.lang.Law'>}ο
- class hmrb.lang.Law(lines: List, vars: Dict)ο
Bases:
object
Represents a rule segment of a Babylonian grammar. It consists of an optional name, a list of attributes, and a compulsory body.
- Parameters
lines (lines [list] -- segment) β
- _parse(lines: List) None ο
- static _parse_atts(lines: List) Dict ο
- static _parse_name(first_line: str, start: int) str ο
- static _segment_lines(lines: List) Tuple[List, List] ο
- class hmrb.lang.Ref(ref: str, neg: bool, min_: int, max_: int, label: Optional[str])ο
Bases:
object
Represents a reference to a variable.
- Parameters
reference (ref [str] -- variable) β
negated (neg [bool] --) β
matches (max [int] -- maximum number of) β
matches β
label (label [str] -- reference) β
- class hmrb.lang.Types(value)ο
Bases:
enum.Enum
Types of segments and items
- BLOCK = 'block'ο
- LAW = 'law'ο
- UNIT = 'unit'ο
- VAR = 'var'ο
- VAR_REF = 'var_ref'ο
- class hmrb.lang.Unit(atts: Dict, neg: bool, min_: int, max_: int, label: Optional[str])ο
Bases:
object
Represents a group of attribute constraints that form a rule unit. Units are typically members of a block.
- Parameters
attributes (atts [dict] -- unit) β
negated (neg [bool] --) β
matches (max [int] -- maximum number of) β
matches β
label (label [str] -- unit) β
- class hmrb.lang.Var(lines: List, vars: Dict)ο
Bases:
object
Represents a named rule variable segment of a Babylonian grammar.
- Parameters
lines (lines [list] -- segment) β
- _parse(lines: List) None ο
- static _parse_name(first_line: str, start: int) str ο
- hmrb.lang.char_iter(string: str) Generator ο
Iterate over the characters of a string while preserving escaped chars. The point is to allow escaped characters to be treated differently during char iteration (parsing) and then unescaped inside the final data structure.
- Parameters
string (string [str] -- regex) β
- Returns
- [Generator] β generator iterating over the characters of the
string
- hmrb.lang.parse_block(string: str, vars: Dict, neg: bool = False, min_: int = 1, max_: int = 1, label: Optional[str] = None, start: int = - 1) hmrb.lang.Block ο
Parses a block string into a Block object. Takes quantifiers and negation modifier parameters. Recursively calls itself or parse_unit to parse nested blocks and units.
- Parameters
string (string [str] -- block) β
negated (neg [bool] -- True if) β
matches (max [int] -- maximum number of) β
matches β
label (label [str] -- block) β
number (start [int] -- block start line) β
- Returns
[Block] β Block object representing the parsed string
- hmrb.lang.parse_unit(string: str, neg: bool = False, min_: int = 1, max_: int = 1, label: Optional[str] = None) hmrb.lang.Unit ο
Parses a unit string into a Unit object. Unit is a list of key-value pairs inside a pair of brackets. Key-value pairs are separated by a comma. There are colons between keys and values. Values are set inside double quotes, while keys are alphanumeric var-like names. Hyphens and underscores are allowed in the key names, but numbers and hyphens are not allowed in the beginning. An arbitrary amount of space separators is allowed between each of the components of the Unit (key, value, colon and comma).
Examples
(att_name: βattribute valueβ, att-name2: βattribute valueβ)
(att_name:βattribute valueβ,att-name2:βattribute valueβ)
- Parameters
string (string [str] -- unit) β
negated (neg [bool] -- True if) β
matches (max [int] -- maximum number of) β
matches β
- Returns
[Unit] β Unit object representing the parsed string
- hmrb.lang.parse_value(string: str) Union[str, dict, bool, int, float] ο
Unescapes a Unit attribute value and determines whether it is a regular string or a Regex.
- Parameters
value (string [str] -- attribute) β
- Returns
[Union[str, Regex, bool, int, float]] β parsed value
- hmrb.lang.unescape(string: str) str ο
Unescaping escaped characters typically inside attribute values.
- Parameters
unescaped (string [str] -- string to be) β
- Returns
[str] β unescaped string
- hmrb.lang.unique(sequence: list) Iterator ο
hmrb.protobuffer moduleο
- class hmrb.protobuffer.Labels(labels: set, depth: int, length: int = 1)ο
Bases:
object
Class wrapper handling Labels message protobuffer
Class for creating, holding and merging Labels type protobuffer messages with other defined types of messages. Initialization of the class creates a new Labels message. Addition of new Labels is handled through the += (__iadd__) magic method.
- Protobuffer definition (proto3):
- message Labels {
map<string, Span> items = 1; }
- message Span {
string start = 1; string end = 2; }
- Parameters
labels (list) β
depth (int) β of the Match span)
length (int) β (defaults to 1)
- Public methods:
+= β handles the addition of a new protobuffer to the object get_depth β returns the maximum depth reached
Notes
span start and end integers are stored as strings, since protobuffers are not able to distinguish between set 0 and unset (default) 0. See Googleβs protobuffer documentation on default values.
- class hmrb.protobuffer.Match(attributes: Dict, depth: int)ο
Bases:
object
Class wrapper handling Match message protobuffers
Class for creating, holding and merging Match type protobuffer messages with other defined types of messages. Initialization of the class creates a new Match message. The Match is considered Active if it contains any valid attributes. Inactive Matches are later ignored in merging objects. An inactive Match with depth_reached transfers its depth_reached to the new object. Addition of data is handled through the += (__iadd__) magic method.
- Protobuffer definition (proto3):
- message Match {
Span span = 1; map<string, string> attributes = 2; map<string, google.protobuf.Any> underscore = 3; }
- message Span {
string start = 1; string end = 2; }
- Parameters
attributes (dict) β (except reserved attributes that are added to underscore)
depth (int) β of the Match span)
- Public methods:
+= β handles the addition of a new protobuffer to the object set_depth β sets depth reached get_depth β returns the maximum depth reached
Notes
span start and end integers are stored as strings, since protobuffers are not able to distinguish between set 0 and unset (default) 0. See Googleβs protobuffer documentation on default values.
- get_depth() int ο
- set_depth(depth: int) None ο
- set_start(start: int) None ο
- class hmrb.protobuffer.Responsesο
Bases:
object
Class wrapper handling Responses message protobuffers
Class for creating, holding and merging Response type protobuffer messages with other defined types of messages (see response.proto for protocol buffer definitions). Initializing the class creates an empty Responses protobuffer. Addition of data is handled through the += (__iadd__) magic method.
- Protobuffer definition (proto3):
- message Responses {
repeated Match items = 1; }
- Public methods:
+= β handles the addition of a new protobuffer to the object set_start β sets the start of all (not set) span messages get_depth β returns the maximum depth reached
- format(sort_length: bool = False) Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]] ο
- get_depth() int ο
- set_depth(depth: int) None ο
- set_start(start: int) None ο
- hmrb.protobuffer.mirror_depth(left: Union[hmrb.protobuffer.Match, hmrb.protobuffer.Responses], right: Union[hmrb.protobuffer.Match, hmrb.protobuffer.Responses]) None ο
- hmrb.protobuffer.mirror_labels(left: Any, right: Any) None ο