Hammurabi πΊο

π©βπ« Introductionο
Hammurabi[hmrb] is a system designed to efficiently execute rules on sequences of data. Its rule syntax is simple and human-readable but also very expressive.
As input, the system takes a sequence of hash tables (Python dict
) and is capable of matching any combination of key-value pairs in order. It was designed as a task agnostic framework applicable to a variety of tasks, for instance, intent recognition, text annotation and log monitoring.
Featuresο
Attribute level rule definitions using key-values pairs
Efficient matching of sequence using hash tables with no limit on length
Support for nested boolean expressions and wildcard operators similar to regular expressions
Variables can be side-loaded and reused throughout different rule sets
User-defined rule-level callback functions triggered by a match
Labels to tag and retrieve matched sequence segments
Rationaleο
Rules and heuristics are often used to kick start a project which has insufficient data for a machine learning solution. Hammurabi was built to abstract away these rules and heuristics and make them simple, reliable and explainable. This reduces the effort of building, testing and maintaining early-stage products.
Release Historyο
- Version
v1.2.1 (25.01.2022)
v1.2.0 (14.05.2021)
v1.1.1 (25.02.2021)
v1.1.0 (02.02.2021)
v1.0.0 (29.04.2020)
π¦ Quick Startο
Hammurabi is a generic rule engine library that allows the user to match sequences of objects with an arbitrary set of attributes against a grammar of rules (for more details see π― Writing Rules).
Installationο
To begin, simply install the package from a supported repository (PyPi, Gemfury, Artifactory):
$ pip install hmrb
Inputο
A great way to illustrate the use of Hammurabi is processing annotated text against a rule grammar. Here we will implement a toy relation extraction grammar that will look for people that love gorillas. Annotated text is a sequence of tokens with annotation attributes β this can be the input of the system. For example, we can run a few sentences through spaCy and serialise the output in JSON like this:
import json
import spacy
nlp = spacy.load('en_core_web_sm')
sentences = 'I love gorillas. Peter loves gorillas. Jane loves Tarzan.'
input_ = []
for sent in nlp(sentences).sents:
sent_lst = []
for token in sent:
token_dict = {
'text': token.orth_,
'lemma': token.lemma_,
'pos': token.pos_
}
sent_lst.append(token_dict)
input_.append(sent_lst)
with open('my-input.json', 'w') as fh:
json.dump(input_, fh, indent=2)
Content of my-input.json
:
[
[
{"text": "I", "lemma": "-PRN-", "pos": "PRON"},
{"text": "love", "lemma": "love", "pos": "VERB"},
{"text": "gorillas", "lemma": "gorilla", "pos": "NOUN"},
{"text": ".", "lemma": ".", "pos": "PUNCT"}
],
[
{"text": "Peter", "lemma": "Peter", "pos": "PROPN"},
{"text": "loves", "lemma": "love", "pos": "VERB"},
{"text": "gorillas", "lemma": "gorilla", "pos": "NOUN"},
{"text": ".", "lemma": ".", "pos": "PUNCT"}
],
[
{"text": "Jane", "lemma": "Jane", "pos": "PROPN"},
{"text": "loves", "lemma": "love", "pos": "VERB"},
{"text": "Tarzan", "lemma": "Tarzan", "pos": "PROPN"},
{"text": ".", "lemma": ".", "pos": "PUNCT"}
]
]
Rulesο
In order to capture the right sequence, we need to write a grammar with rules that would detect the sentences containing people that like gorillas. For more details on grammar see π― Writing Rules. Referencing the Babylonian king, rules in Hammurabi are denoted with the keyword Law. We wrote below a simple subject-verb-object rule that aims to detect all people that love gorillas:
Law:
(
(pos: "PROPN")
(text: "loves")
(text: "gorillas")
)
It is a very specific rule that will match only one of our input sentences, so we may want to relax it a little bit. We can include pronouns as well as names for the subject and abstract the number of both subject and object by using lemma requirements instead of text:
Law:
- callback: "gorilla people"
(
((pos: "PROPN") or (pos: "PRON"))
(lemma: "love")
(lemma: "gorilla")
)
Now that weβve relaxed our rule, we may want to detect other things in our input like say love interests. We can write another rule that identifies a person that loves another person but this time keep it specific:
Law:
- callback: "lover"
(
(pos: "PROPN")
(text: "loves")
(pos: "PROPN")
)
Callbacksο
Hammurabi supports passing a callback function using the reserved callback attribute. The name provided as value is looked up against a dictionary provided to the callbacks parameter of the Core constructor. The functions associated with matched rules are executed after the matching process is complete. They are passed three positional parameters which then need to handle: the original object sequence seq, the slice of span matched based on the sequence, and all the associated rule attributes from the grammar as data.
All rules (Laws) can take an arbitrary number of attributes that will be part of the data structure that is passed along with a matched span. This way the user can identify the rule that was fired and if necessary take action or access some specific data/information through this mechanism.
A Complete Exampleο
import json
from hmrb.core import Core
with open("examples/my-input.json", "r") as fh:
input_ = json.load(fh)
def conj_be(subj: str) -> str:
if subj == "I":
return "am"
elif subj == "you":
return "are"
else:
return "is"
def gorilla_clb(seq: list, span: slice, data: Dict) -> None:
subj = seq[span.start]["text"]
be = conj_be(subj)
print(f"{subj} {be} a gorilla person.")
def lover_clb(seq: list, span: slice, data: Dict) -> None:
print(
f'{seq[span][-1]["text"]} is a love interest of'
f'{seq[span.start]["text"]}.'
)
clbs = {"gorilla people": gorilla_clb, "lover": lover_clb}
grammar = """
Law:
- callback: "gorilla people"
(
((pos: "PROPN") or (pos: "PRON"))
(lemma: "love")
(lemma: "gorilla")
)
Law:
- callback: "lover"
(
(pos: "PROPN")
(text: "loves")
(pos: "PROPN")
)
"""
hmb_ext = Core(callbacks=clbs)
hmb_ext.load(grammar)
print("Loaded grammar...")
# process sentences one by one
for i, sent in enumerate(input_, start=1):
hmb_ext(sent)
# Loaded grammar...
# Processing sent 1
# I am a gorilla person.
# Processing sent 2
# Peter is a gorilla person.
# Processing sent 3
# Tarzan is a love interest of Jane.
π― Writing Rulesο
Adding rules to Hammurabi is straightforward using a simple human readable syntax capable of defining complex rules.
This section walks you through steps to define rules for Hammurabi. Each subsection introduces a new feature that can be added to better express your rule. Naturally you can combine them as you see fit.
Basic rule syntaxο
The following code snippet shows the structural framework of a simple rule.
You define a rule as a Law
within Hammurabi.
Law <name>:
- <return key>: "<return value>"
- <return key>: "<return value>"
- callback: "<callback name>"
(
(attribute: "value")
...
(attribute: "value")
)
Following a
Law
keyword, you can optionally define a name for the rule. This allows the immediate re-use of the law in a subsequent rule.The initial head part of the
Law
lists key-value pairs that are returned if the rule is matched. The return key can be any string value except reserved keyscallback
and_
.Finally, the body part contains definition of the token sequence which the rule is intended to match.
Attributes matched by the rule engine are defined as key-value pairs.
Keys come from your problem settingβs vocabulary. For instance, in time-series, this could be any attribute of a time-step, or in Natural Language Processing this could be meta-data on your word tokens.
Values define the actual token needed the pass the rule. Types supported are
string
,bool
,int
andfloat
. The values most importantly should align with your key vocabulary.
Note
Escaping characters is required in string values for special characters i.e. "
should be entered as \"
and \
as \\
.
Law
- package: "found number of icecreams needed"
(
(written_number: True) # could match a number like "one"
(text: "icecream")
)
You can also define multiple attribute conditions that must be true for an element (we consider this as an and
relationship between attributes). For example, in the below rule both icecream and yell attributes need to match the second sequence element.
Law
- package: "found angry person demanding lots of icecream"
(
(text: "much")
(text: "icecream", yell: True)
(today: True)
)

Photo by Alex Jonesο
Union (OR)ο
You can also define rules that require a union logic between tokens. Unions are defined by the or
keyword.
Note that unions must be wrapped in brackets (the indent is optional)
Law
- package: "found person with small icecream appetite"
(
(
(text: "small")
or
(text: "little")
)
(text: "icecream")
)
Multiple unions can be nested in a simple structure allowing Hammurabi to define complex rules in a simple manner.
Law
- package: "handling lots of icecream"
(
(
(
(text: "much")
or
(text: "little")
)
or
(
(type: "vanilla")
or
(type: "chocolate")
or
(type: "strawberry")
)
)
(text: "icecream")
)
Note
and
syntax: by definition an intersection logic exists between sequential tokens. As mentioned earlier, an and
logic exists between attribute key-value pairs.
Optionals and multiplesο
To allow compact rules, Hammurabi supports defining optionals and multiples. Each section or element can be marked with the number of times it should be matched. The table below summarises the available logical syntax.
Syntax |
Min |
Max |
---|---|---|
|
0 |
1 |
|
1 |
inf |
|
0 |
inf |
|
X |
Y |
(default) |
1 |
1 |
Law
- package: "found person who might be willing to pay for icecream"
(
optional (text: "free")
(text: "icecream")
)
Law
- package: "found person only looking for (very) big icecream"
(
zero or more (text: "very")
(text: "big")
1 to 2 (text: "icecream")
)
Naturally, this functionality can be combined with any other syntax on any level.
Law
- package: "found person only looking for bright icecream of any flavor"
(
(text: "bright")
optional (
(type: "vanilla")
or
(type: "chocolate")
or
(type: "strawberry")
)
(text: "icecream")
)
Regular Expressionο
Hammurabi also supports defining attribute values as regular expressions (see Python RE library). The full syntax is as (attribute: regex(β<regex expression>β)) and can be used on any string value.
Law
- package: "found person only looking for some quantity of icecream"
(
optional (text: "around")
(text: regex("([0-9])\w+"))
(text: "icecream")
)
Note
Escaping character inside Regex needs to be doubled for special characters i.e. \.
should be entered as \\.
and \\
as \\\
.
Variablesο
Variables allow the reuse of rules, which makes the grammar more readable as well as more efficient.
There are two types of variabels supported in Hammurabi: Var
and named Law
.
Definitions:
Var <name>:
To reuse a sequence of token rules simply define it as a variable. The variable definition uses a similar syntax to defining laws with the addition of naming the variable. This allows us to refer to it in subsequent code. Note that variable definitions are not actually rules. They are elements to be used in Laws and will not be matched on their own. For this same reason, they consist solely of the body (i.e. no head part). To support functionality where you want to not only define a rule but also reuse it in other rules, we added named laws (see below).Law <name>:
To reuse a Law as a variable add a name to its definition. You can refer to it in exactly the same way as a variable $name.
References:
$<name>
use references to add a sequence defined in a variable to your rule (or to another variable). A reference is defined as the name of a defined variable preceded by$
. Variable references can be used in conjunction with other features of the language such as optionals and labels.
Var flavored_icecream:
(
(
(type: "vanilla")
or
(type: "chocolate")
or
(type: "strawberry")
)
(text: "icecream")
)
Law
- package: "found person only looking for some quantity of icecream"
(
(text: "we")
(text: "want")
$flavored_icecream
)
When redefining the same as named law (as shown in the below example) you will receive matches for both sections.
Law flavored_icecream:
(
(
(type: "vanilla")
or
(type: "chocolate")
or
(type: "strawberry")
)
(text: "icecream")
)
Law
- package: "found person only looking for some quantity of icecream"
(
(text: "we")
(text: "want")
$flavored_icecream
)
Callbacks and Labelsο
Hammurabi also makes it easy to work with the actual matches. We support both retrieval of data through labels and defining a custom action to be executed on match.
Definitions:
<label> ->
is the syntax that defines a label. It can be added to any element of the rule. Hammurabi will return the (start, end) offsets of the label within the original sequence in the match object.- callback: "<callback_name>"
is the syntax used to attach a callback to aLaw
,named Law
orVar
. The<callback_name>
string needs to match the key in the (key, function) dictionary that is passed in during the construction of the engine.
Law flavored_icecream:
(
flavour -> (
(type: "vanilla")
or
(type: "chocolate")
or
(type: "strawberry")
)
(text: "icecream")
)
Law
- package: "found person only looking for some quantity of icecream"
- callback: "handle_icecream_van"
(
(text: "we")
(text: "want")
$flavored_icecream
)
π€ spaCy and callbacksο
Hammurabi in spaCy 2.X pipelinesο
We provide native support for spaCy through the SpacyCore
object.
The SpacyCore
object can simply be integrated into your existing spaCy 2.X pipelines.
from hmrb.core import SpacyCore
core = SpacyCore(callbacks=CALLBACKS,
map_doc=convert_to_json_fn,
sort_length=True)
core.load(rules)
nlp.add_pipe(core)
SpacyCore
takes a dict of callbacks, an optional function that converts spaCy doc type (to_json) to a representation that corresponds to your rules and a bool whether to sort and execute in ascending order according to match length.
Once the object is instantiated, you can load rules using the .load
method.
Hammurabi in spaCy 3.X pipelinesο
We also provide native support for spaCy 3.0+. You still have to import the SpacyCore object to run the component registration and the configuration syntax is slightly different versus 2.0.
We follow the new custom pipeline component API under spacy.language
[Link]:
First, we have to register both our augmenter functions map_doc and any callback functions we would call in spaCyβs registry.
Second, we have to create a configuration dictionary that contains the rules and references the callbacks and mapping functions as shown in the example below.
Finally, we can add the "hmrb"
pipeline component using our configuration to the spaCy pipeline.
from hmrb.core import SpacyCore
@spacy.registry.augmenters("jsonify_span")
def jsonify_span(span):
return [
{"lemma": token.lemma_, "pos": token.pos_, "lower": token.lower_}
for token in span
]
@spacy.registry.callbacks("dummy_callback")
def dummy_callback(seq: list, span: slice, data: dict) -> None:
print("OK")
conf = {
"rules": GRAMMAR
"callbacks": {"my_callback": "callbacks.dummy_callback"}
"map_doc": "augmenters.jsonify_span"
}
nlp.add_pipe(SpacyCore.name, config=conf)
Handling Callbacksο
Callbacks allow defining a custom action to be executed upon matching. There are no restrictions on how callbacks can be used, but we provide a few handy patterns below.
Validationο
Callbacks can be used to validate likely matches and thereby programmatically extend your rule matching capacity beyond the limits of the grammar.
Var cardinal:
(
(text: regex("^[1-9]+$"))
)
Var particle:
(
(text: "st)
(text: "nd")
(text: "rd")
(text: "th")
)
Law I_want_an_Nth_icecream:
- callback: "validate_Nth_icecream"
(
(text: "I")
(text: "want")
(text: regex("an?"))
cardinal -> $cardinal
particle -> $particle
(text: "icecream")
)
The above rule would successfully match I want a 2nd icecream. It will also incorrectly match I want a 2th ice cream because we didnβt spell out all valid English ordinal abbreviations explicitly. Instead of writing an exhaustive list, callbacks can be used to filter out false positives post-match. The following callback definition provides an example of post-match validation:
ORDINALS = {
'1': 'st',
'2': 'nd',
'3': 'rd'
}
def validate_Nth_icecream(doc, span_range, match_data):
cardinal_offsets = match_data['_']['labels']['cardinal']
particle_offsets = match_data['_']['labels']['particle']
cardinal = doc[*cardinal_offsets].text
particle = doc[*particle_offsets].text
if ORDINALS.get(cardinal, 'th') != particle:
print('No ice cream for you!')
else:
print(f'This is your {cardinal}{particle} ice cream!'
Note how the labels cardinal and particle are used to easily identify relevant tokens in the match.
Modularityο
When working with large nested rule bases, callbacks can quickly start to become very complex. This can be prevented by applying a modular pattern within your rule base and your callback codebase:
Var cardinal:
(
(text: regex("^[1-9]+$"))
)
Var particle:
(
(text: "st")
(text: "nd")
(text: "rd")
(text: "th")
)
Law abbreviated_ordinal:
- callback: "validate_ordinal"
(
$cardinal
$particle
)
Law Do_you_want_the_Nth_or_Nth_icecream:
- callback: "validate_Nth_or_Nth_icecream"
(
(text: "Do")
(text: "you")
(text: "want")
(text: "the")
ordinal1 -> $abbreviated_ordinal
(text: "or")
ordinal2 -> $abbreviated_ordinal
(text: "icecream")
)
This example shows how you can delegate validation complexity to a sub-rule. The ordinal validation behaviour is logically separated from the sentence validation behaviour. This allows to maintain a more readable grammar and have a cleaner 1-to-1 relationship between logical units, rules and callbacks:
ORDINALS = {
'1': 'st',
'2': 'nd',
'3': 'rd'
}
def validate_ordinal(doc, span_range, match_data):
cardinal_offsets = match_data['_']['labels']['cardinal']
particle_offsets = match_data['_']['labels']['particle']
cardinal = doc[*cardinal_offsets].text
particle = doc[*particle_offsets].text
if ORDINALS.get(cardinal, 'th') == particle:
doc[cardinal_offsets[0]:particle_offsets[1]]._.ordinal = cardinal + particle
def validate_Nth_or_Nth_icecream(doc, span_range, match_data):
ordinal1_offsets = match_data['_']['labels']['ordinal1']
ordinal2_offsets = match_data['_']['labels']['ordinal2']
ordinal1 = doc[*ordinal1_offsets]._.ordinal
ordinal2 = doc[*ordinal2_offsets]._.ordinal
if ordinal1 and ordinal2 and ordinal1 == ordinal2:
print('You mentioned the same ice cream twice! I want more choice!')
else:
print('These are both valid options! How can I choose?!')
Note that validate_ordinal is only responsible for validating the abbreviated ordinal. If successful, it persists its results in the doc object. These will be picked up by validate_Nth_or_Nth_icecream, which does not perform any additional validation of the ordinal syntax. Instead, it checks that the two compared ordinals are different. This example shows how frequent callback usage can be used to achieve better segregation of responsibility.
π hrmb packageο
hmrb.core moduleο
- class hmrb.core.Core(callbacks: Optional[Dict] = None, sets: Optional[Dict] = None, sort_length: bool = False)ο
Bases:
object
Class handling the main functions surrounding the rule engine
- Parameters
callbacks (dict) β dictionary of callback functions to execute following a successfull call.
sort_length (bool) β sort match results according to span length in ascending order (affects callback execution as well.)
- Public methods:
load : add list of rules to engine __call__ : match list of input dicts with internal rules
- _execute(responses: Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]], input_: Any) None ο
- _load(rules: List[List[Dict]], vars: List[List[Dict]]) None ο
Adds list of rules to the engine
- Implementation: passes rules to the root BaseNode of the class
sequentially
- Parameters
rules (list) β list of rules to add to root node
vars (list) β list of shared varHandle objects to use
- _match(spans: List[Tuple[int, list]]) Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]] ο
Takes a list of spans and executes matching by passing each to the root node.
- Parameters
spans (list) β list of spans to match
- Returns
list of tuples containing match results
- Return type
(list)
- static default_callback(input_: list, span: slice, data: Dict) None ο
- load(inputs: str) None ο
Adds rules to the engine.
- Parameters
inputs (list) β list of rules in dialect
- class hmrb.core.SpacyCore(callbacks: typing.Optional[typing.Dict] = None, sets: typing.Optional[typing.Dict] = None, map_doc: typing.Callable = <function _default_map>, sort_length: bool = False)ο
Bases:
hmrb.core.Core
Class wrapping the Core object into a spaCy component.
- Parameters
callbacks (dict) β dictionary of callback functions to execute following a successfull call.
sort_length (bool) β sort match results according to span length in ascending order (affects callback execution as well.)
- Public methods:
load : add list of rules in the engine __call__ : match a spaCy Document or Span against the rule set
- name = 'hmrb'ο
- hmrb.core._default_map(doc: Any) Any ο
hmrb.node moduleο
- class hmrb.node.BaseNode(data: Optional[Dict] = None)ο
Bases:
object
Class for handling nodes
BaseNodes is an atomic element of our data structure. Each token is handled by a separate BaseNode (or one of its subclasses). The BaseNode is designed to build itself in a recursive manner through the consume method from a list of dict rules. It handles the matching of a list of incoming tokens through the BaseNode call method.
- Parameters
data (dict) β data associated with the node (optional: None)
- Public methods:
consume : handles the building of the data structure from a rule __call__ : handles the matching of incoming data
- _build_child(child_key: Tuple[frozenset, int], child: hmrb.node.BaseNode) None ο
Adds new child to the children of BaseNode. Updates call order and attribute index with the new child.
- Parameters
child_key (frozenset) β a hashable identifier for new child
child (BaseNode) β new child BaseNode object
- _consume_child(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None ο
- _consume_regex(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None ο
- _consume_set(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None ο
- _consume_var(next_rule_token: Dict, rule: List[Dict], vars: Dict, sets: Dict) None ο
- static _make_node_key(token: Dict) Tuple[frozenset, int] ο
Creates a hashable dictionary key from a dict token
- Parameters
token (dict) β a token dictionary
- Returns: (frozenset, int) created from the list of (key, value) tuples
(sorted by default) and hash of data items
- _match(token: Dict) set ο
(private) Handles the matching of a single token dictionary with the current nodes children.
Implementation: TODO:
- Notes: In the case of missing attribute att_name, att_value is None
and all children with att_name are removed from the matches
- Parameters
token (dict) β dict of (attribute, values) of a single token
- consume(rule: List[Dict], vars: Dict, sets: Dict) None ο
Builds internal representation from list of rules
- Implementation: Recursively handles the construction of the internal
tree structure. Passes token to the appropriate BaseNode class/subclass for handling. If an equivalent node already exists, the remaining tokens of the rules are passed to that node. If no such node exists a new node is added to the children of the current node.
- Parameters
rule (List(dict)) β list of token rule token dictionaries
vars (dict) β dict of all varHandle objects created
- static get_att(token: Any, att_name: str) Any ο
Retrieves the value of a token attribute regardless of whether it is a dictionary or a normal object.
- Parameters
token (Any) β : target token
att_name (str) β : attribute name
- Returns
Value of the target attribute
- Return type
response (Any)
- optimise_call_order() None ο
- class hmrb.node.FrozenMap(*args: Any, **kwargs: Any)ο
Bases:
collections.abc.Mapping
based on https://github.com/pcattori/maps Creates a hashable from any object using frozensets
- _abc_impl = <_abc_data object>ο
- classmethod recurse(obj: Any) Any ο
- class hmrb.node.RegexNode(token: Dict)ο
Bases:
hmrb.node.BaseNode
- class hmrb.node.SetNode(rule_set: Dict, data: Dict)ο
Bases:
hmrb.node.BaseNode
Class for Set nodes
SetNode is a subclass of BaseNode designed to efficiently handling the matching of sets.
- Parameters
rule_set (dict) β global dictionary of sets to check
data (dict) β data object that is returned if the SetNode is matched.
- Public methods:
- __call__handles the matching of incoming list of tokens by
checking if the token is present in the rule_set.
- class hmrb.node.StarNode(data: Optional[Dict] = None)ο
Bases:
hmrb.node.BaseNode
- hmrb.node._recurse(obj: Any, map_fn: Callable) Any ο
based on https://github.com/pcattori/maps Handles recursion within FrozenMap
- hmrb.node.make_key(obj: Any) int ο
- Parameters
obj (any) β any type of nested / unnested object
Returns: (int) created from the hash of the FrozenMap object
Notes: Pythonβs hash() is inconsistent across processes/runs.
- class hmrb.node.varNode(var_handle: hmrb.node.BaseNode, data: Dict, min_length: int, min_run: int, max_run: int)ο
Bases:
hmrb.node.BaseNode
Class for var nodes
varNode is a subclass of BaseNode designed to efficiently handling the reuse of the same node structure (macros). The varNode wraps around a BaseNode object (varHandle) to support shared objects and the logical repitition of executions. The remaining parts are passed to its super BaseNode consume. In this way, we have a clear distinction between repeated/seperated section and sections that follow the repeated parts.
Matching is done in a similar two step process. First, the incoming pattern is passed to the varHandle structure var_handle returning depths of successfull matches. Depending on parameters of the varNode, it tries to match the varHandle multiple times. The remaining unmatched tokens are passed to the children βouterβ structure for matching. In case, min_run is 0 it also passes the original input to the βouterβ structure.
- Parameters
var_handle (BaseNode) β shared BaseNode object that becomes the βinnerβ structure of the varNode
data (dict) β data object that is returned if the varNode is matched.
min_length (int) β precomputed minimum length of the inner structure. Used to determine if enough input tokens are left to do another loop.
min_run (int) β minimum runs of the inner structure. If set to 0 the inner structure is optional (default 1).
max_run (int) β maximum runs of the inner structure (default 1).
- Public methods:
- __call__handles the matching of incoming list of tokens by
first recursing through the shared inner varHandle and then by recursing the remaining unmatched tokens through the super BaseNode object (βouterβ).
hmrb.lang moduleο
- class hmrb.lang.Block(members: List, vars: Dict, neg: bool, min_: int, max_: int, label: Optional[str], union: bool = False, is_body: bool = False)ο
Bases:
object
Represents a rule block that may be the body or part of the body of a Law or a Var.
- Parameters
members (members [dict] -- Block) β
negated (neg [bool] --) β
matches (max [int] -- maximum number of) β
matches β
- _add_var(children: list, length: int, min_: int, max_: int) Any ο
- _parse_block(block: hmrb.lang.Block) None ο
- _parse_labeled_element(label: str, parent: list) None ο
- _parse_ref(ref: hmrb.lang.Ref) None ο
- _parse_unit(unit: hmrb.lang.Unit) None ο
- _sequence_extend(block: hmrb.lang.Block) None ο
- _union_extend(block: hmrb.lang.Block) None ο
- parse() None ο
- class hmrb.lang.BlockIterator(block_str: str, inf: int = 10000000000, start: int = 1)ο
Bases:
object
Provides an iterator that iterates over the top-level block segments. These segments could be blocks (see also Block), units (see also Unit), and variable references (see also Ref). These blocks are validated but parsed later on. This iterator will produce tuples of the following shape: (content_string, negated, min_match_num, max_match_num)
- Parameters
string (block_str [str] -- block) β
value (inf [int] -- infinity) β
line (start [int] -- start) β
- _check_body_level() None ο
- _close_bracket(ch: str) None ο
Handles a closing bracket.
Level 0: checks for an open variable reference and closes the block Level 1: checks type of segment (block/unit) and adds to iterable Other levels: adds character to the buffer
- Parameters
ch β β closing bracket character
- _consume(block: str) None ο
Consumes a block string a character at a time. Note that escaped characters are treated differently through the character iterator. The iterator acts on all brackets but it validates only variable references and operators from level 1 (the top content level).
- Parameters
string (block [str] -- block) β
- _open_bracket(ch: str) None ο
Handles opening bracket.
Level 0: checks for no operators and opens the block Level 1: parses operator into operator buffer and adds char to buffer Other levels: add to buffer
- Parameters
ch β β open bracket char
- _parse_label() None ο
- _parse_operator() Tuple ο
Assumes that there is an operator in the buffer and parses it. Spaces are important for all operators. They are matched as show below. Number placeholders can be replaced with any valid integer.
Examples
not
optional
zero or more
one or more
at least {number}
at most {number}
{number} to {number}
- Raises
ValueError -- when buffer doesn't contain a valid operator and is β not empty
- Returns
[Tuple] β negated[bool], min # matches, max # matches
- _parse_var() None ο
Parses a variable reference name and adds it to the iterable.
- property is_union: boolο
- class hmrb.lang.Grammar(string: str, vars_: Dict)ο
Bases:
object
Represents a Babylonian grammar. It may consist of Var and Law segments.
- Parameters
parsed (string [str] -- grammar string to be) β
- _build(string: str) None ο
- _deploy() None ο
- _map_segments(type_: Any) Dict ο
Collects all segments of particular type and creates a mapping between their names and the objects themselves.
- Returns
[dict] β mapping between variable names and segments
- static _parse_segment_type(line: str) Optional[hmrb.lang.Types] ο
Determines the type of grammar segment: Law or Var.
- Parameters
line β β segment lines
- Returns
[Types] β segment type
- _segment(string: str) Generator ο
Segments the grammar into laws (Law) and variables (Var). Yields the type of the segment as well as all the lines.
- Parameters
string β β string representation of the segment
- static end_var(parent_end: Any) None ο
- parser_map = {<Types.VAR: 'var'>: <class 'hmrb.lang.Var'>, <Types.LAW: 'law'>: <class 'hmrb.lang.Law'>}ο
- class hmrb.lang.Law(lines: List, vars: Dict)ο
Bases:
object
Represents a rule segment of a Babylonian grammar. It consists of an optional name, a list of attributes, and a compulsory body.
- Parameters
lines (lines [list] -- segment) β
- _parse(lines: List) None ο
- static _parse_atts(lines: List) Dict ο
- static _parse_name(first_line: str, start: int) str ο
- static _segment_lines(lines: List) Tuple[List, List] ο
- class hmrb.lang.Ref(ref: str, neg: bool, min_: int, max_: int, label: Optional[str])ο
Bases:
object
Represents a reference to a variable.
- Parameters
reference (ref [str] -- variable) β
negated (neg [bool] --) β
matches (max [int] -- maximum number of) β
matches β
label (label [str] -- reference) β
- class hmrb.lang.Types(value)ο
Bases:
enum.Enum
Types of segments and items
- BLOCK = 'block'ο
- LAW = 'law'ο
- UNIT = 'unit'ο
- VAR = 'var'ο
- VAR_REF = 'var_ref'ο
- class hmrb.lang.Unit(atts: Dict, neg: bool, min_: int, max_: int, label: Optional[str])ο
Bases:
object
Represents a group of attribute constraints that form a rule unit. Units are typically members of a block.
- Parameters
attributes (atts [dict] -- unit) β
negated (neg [bool] --) β
matches (max [int] -- maximum number of) β
matches β
label (label [str] -- unit) β
- class hmrb.lang.Var(lines: List, vars: Dict)ο
Bases:
object
Represents a named rule variable segment of a Babylonian grammar.
- Parameters
lines (lines [list] -- segment) β
- _parse(lines: List) None ο
- static _parse_name(first_line: str, start: int) str ο
- hmrb.lang.char_iter(string: str) Generator ο
Iterate over the characters of a string while preserving escaped chars. The point is to allow escaped characters to be treated differently during char iteration (parsing) and then unescaped inside the final data structure.
- Parameters
string (string [str] -- regex) β
- Returns
- [Generator] β generator iterating over the characters of the
string
- hmrb.lang.parse_block(string: str, vars: Dict, neg: bool = False, min_: int = 1, max_: int = 1, label: Optional[str] = None, start: int = - 1) hmrb.lang.Block ο
Parses a block string into a Block object. Takes quantifiers and negation modifier parameters. Recursively calls itself or parse_unit to parse nested blocks and units.
- Parameters
string (string [str] -- block) β
negated (neg [bool] -- True if) β
matches (max [int] -- maximum number of) β
matches β
label (label [str] -- block) β
number (start [int] -- block start line) β
- Returns
[Block] β Block object representing the parsed string
- hmrb.lang.parse_unit(string: str, neg: bool = False, min_: int = 1, max_: int = 1, label: Optional[str] = None) hmrb.lang.Unit ο
Parses a unit string into a Unit object. Unit is a list of key-value pairs inside a pair of brackets. Key-value pairs are separated by a comma. There are colons between keys and values. Values are set inside double quotes, while keys are alphanumeric var-like names. Hyphens and underscores are allowed in the key names, but numbers and hyphens are not allowed in the beginning. An arbitrary amount of space separators is allowed between each of the components of the Unit (key, value, colon and comma).
Examples
(att_name: βattribute valueβ, att-name2: βattribute valueβ)
(att_name:βattribute valueβ,att-name2:βattribute valueβ)
- Parameters
string (string [str] -- unit) β
negated (neg [bool] -- True if) β
matches (max [int] -- maximum number of) β
matches β
- Returns
[Unit] β Unit object representing the parsed string
- hmrb.lang.parse_value(string: str) Union[str, dict, bool, int, float] ο
Unescapes a Unit attribute value and determines whether it is a regular string or a Regex.
- Parameters
value (string [str] -- attribute) β
- Returns
[Union[str, Regex, bool, int, float]] β parsed value
- hmrb.lang.unescape(string: str) str ο
Unescaping escaped characters typically inside attribute values.
- Parameters
unescaped (string [str] -- string to be) β
- Returns
[str] β unescaped string
- hmrb.lang.unique(sequence: list) Iterator ο
hmrb.protobuffer moduleο
- class hmrb.protobuffer.Labels(labels: set, depth: int, length: int = 1)ο
Bases:
object
Class wrapper handling Labels message protobuffer
Class for creating, holding and merging Labels type protobuffer messages with other defined types of messages. Initialization of the class creates a new Labels message. Addition of new Labels is handled through the += (__iadd__) magic method.
- Protobuffer definition (proto3):
- message Labels {
map<string, Span> items = 1; }
- message Span {
string start = 1; string end = 2; }
- Parameters
labels (list) β
depth (int) β of the Match span)
length (int) β (defaults to 1)
- Public methods:
+= β handles the addition of a new protobuffer to the object get_depth β returns the maximum depth reached
Notes
span start and end integers are stored as strings, since protobuffers are not able to distinguish between set 0 and unset (default) 0. See Googleβs protobuffer documentation on default values.
- class hmrb.protobuffer.Match(attributes: Dict, depth: int)ο
Bases:
object
Class wrapper handling Match message protobuffers
Class for creating, holding and merging Match type protobuffer messages with other defined types of messages. Initialization of the class creates a new Match message. The Match is considered Active if it contains any valid attributes. Inactive Matches are later ignored in merging objects. An inactive Match with depth_reached transfers its depth_reached to the new object. Addition of data is handled through the += (__iadd__) magic method.
- Protobuffer definition (proto3):
- message Match {
Span span = 1; map<string, string> attributes = 2; map<string, google.protobuf.Any> underscore = 3; }
- message Span {
string start = 1; string end = 2; }
- Parameters
attributes (dict) β (except reserved attributes that are added to underscore)
depth (int) β of the Match span)
- Public methods:
+= β handles the addition of a new protobuffer to the object set_depth β sets depth reached get_depth β returns the maximum depth reached
Notes
span start and end integers are stored as strings, since protobuffers are not able to distinguish between set 0 and unset (default) 0. See Googleβs protobuffer documentation on default values.
- get_depth() int ο
- set_depth(depth: int) None ο
- set_start(start: int) None ο
- class hmrb.protobuffer.Responsesο
Bases:
object
Class wrapper handling Responses message protobuffers
Class for creating, holding and merging Response type protobuffer messages with other defined types of messages (see response.proto for protocol buffer definitions). Initializing the class creates an empty Responses protobuffer. Addition of data is handled through the += (__iadd__) magic method.
- Protobuffer definition (proto3):
- message Responses {
repeated Match items = 1; }
- Public methods:
+= β handles the addition of a new protobuffer to the object set_start β sets the start of all (not set) span messages get_depth β returns the maximum depth reached
- format(sort_length: bool = False) Union[List[Tuple[Tuple[int, int], List[Dict]]], ItemsView[Tuple[int, int], List[Dict]]] ο
- get_depth() int ο
- set_depth(depth: int) None ο
- set_start(start: int) None ο
- hmrb.protobuffer.mirror_depth(left: Union[hmrb.protobuffer.Match, hmrb.protobuffer.Responses], right: Union[hmrb.protobuffer.Match, hmrb.protobuffer.Responses]) None ο
- hmrb.protobuffer.mirror_labels(left: Any, right: Any) None ο