Parsetron Advanced Usage¶
Call-back Functions¶
In the last section we defined color
and times
as:
color = Regex(r'(red|yellow|blue|orange|purple|...)')
times = Set(['once', 'twice', 'three times']) | Regex(r'\d+ times')
A parse result would look something like:
{ 'GOAL': [['blink', 'top', 'red', 'twice']],
'one_parse': [ {'action': 'blink',
'one_parse': ['blink', 'top', 'red', 'twice'],
'color': 'red',
'light': 'top',
'times': 'twice'}]}
But we’d want something more conveniently like:
{ 'GOAL': [['blink', 'top', 'red', 'twice']],
'one_parse': [ {'action': 'blink',
'one_parse': ['blink', 'top', 'red', 'twice'],
'color': [255, 0, 0],
'light': 'top',
'times': 2}]}
This can be achieved by the parsetron.GrammarElement.set_result_action()
call back function, for instance:
def color2rgb(result):
r = result.get().lower()
# r now holds color lexicons
mapper = {
"red": (255, 0, 0),
"yellow": (255, 255, 0),
"blue": (0, 0, 255),
"orange": (255, 165, 0),
"purple": (128, 0, 128)
}
color = mapper[r]
result.set(color)
color = Regex(r'(red|yellow|blue|orange|purple|...)').set_result_action(color2rgb)
The color2rgb
function now first retrieves the lexicon of color by calling
result.get()
(parsetron.ParseResult.get()
), then map it to a RGB
tuple, and finally replacing the result with result.set()
(parsetron.ParseResult.set()
).
Note
The return value of parsetron.GrammarElement.set_result_action()
is the object itself (return self
). Thus in the above example
color
is still assigned with the Regex()
object.
The times
part is only slightly more complicated as it parses numbers
in both digits and words. We define two functions here:
def regex2int(result):
# result holds Regex(r'\d+ times') lexicon
num = int(result.get().split()[0])
result.set(num)
def times2int(result):
r = result.get().lower()
mapper = {"once": 1, "twice": 2, "three times": 3}
num = mapper[r]
result.set(num)
times = Set(['once', 'twice', 'three times']).set_result_action(times2int) | \
Regex(r'\d+ times').set_result_action(regex2int)
Here each grammar element (Set()
and Regex()
) has their own call-back
functions. Together they define the times
variable. The result is that
the times
field in parse result is all converted into an integer number,
no matter whether it’s twice or 20 times.
Next we test whether these call-back functions work as expected!
Test Your Grammar¶
The parsetron.Grammar
class defines a static
parsetron.Grammar.test()
function for testing your grammar. This function
is also called by parsetron’s pytest routine for
both bug spotting and test coverage report. One should freely and fully make
use of the assert
function in test()
after defining a grammar.
The following is the full grammar with a simple test function:
class LightAdvancedGrammar(Grammar):
action = Set(['change', 'flash', 'set', 'blink'])
light = Set(['top', 'middle', 'bottom'])
color = Regex(r'(red|yellow|blue|orange|purple|...)').\
set_result_action(color2rgb)
times = Set(['once', 'twice', 'three times']).\
set_result_action(times2int) | \
Regex(r'\d+ times').set_result_action(regex2int)
one_parse = action + light + Optional(times) + color
GOAL = OneOrMore(one_parse)
@staticmethod
def test():
parser = RobustParser((LightAdvancedGrammar()))
tree, result = parser.parse("flash my top light twice in red and "
"blink middle light 20 times in yellow")
print tree
print result
assert result.one_parse[0].color == (255, 0, 0)
assert result.one_parse[0].times == 2
assert result.one_parse[1].color == (255, 255, 0)
assert result.one_parse[1].times == 20
print
Here we make sure that the first one_parse
structure has its color as the
RGB value of red (result.one_parse[0].color == (255, 0, 0)
) and its times
parameter as an integer (result.one_parse[0].times == 2
). So as the second
one_parse
structure.
Corresponding code of this tutorial is hosted on Github.
Note
When is the call-back function called?
The call-back function is called when we convert the (usually best) parse
tree (parsetron.TreeNode
) to parse result
(parsetron.ParseResult
). It is literally a post-processing
function after parsing. We cannot call it during parsing as a CFG
grammar can potentially output many trees while each of these trees might
output a different parse result.
Modularized Grammars¶
So far we have seen how to convert both colors and numbers into more computer friendly values. However the example code above is too simple to be used in real world. As a matter of fact, both color and number parsing deserve their own grammar. Thus we introduce the notion of modularized grammar: each grammar class defines a minimal but fully functional CFG with desired call-back functions; these grammar classes are shared towards bigger and more complex grammars.
We have provided a few examples in the
parsetron/grammars
folder. For instance, the parsetron.grammars.NumbersGrammar
in
numbers.py
parses not only one/two/three but even 1 hundred thousand five hundred 61
(100561). The parsetron.grammars.ColorsGrammar
in
colors.py
defined over 100 different kinds of colors. All of these definition can be
accessed via their Grammar.GOAL
variable. Then in our lights grammar, we
can simply do:
from parsetron.grammars.times import TimesGrammar
from parsetron.grammars.colors import ColorsGrammar
class ColoredLightGrammar(Grammar):
color = ColorsGrammar.GOAL
times = TimesGrammar.GOAL
...
In the future we will be adding more grammars as we find useful. If you’d like
to contribute your own grammar, send us a pull request! And don’t forget to
test your grammar (by implementing parsetron.Grammar.test()
)!