API References¶

Class Summary¶

`ParseException`	Exception thrown when we can’t parse the whole string.
`GrammarException`	Exception thrown when we can’t construct the grammar.
`MetaGrammar`	A meta grammar used to extract symbol names (expressed as variables) during grammar construction time.
`Grammar`	Grammar user interface.
`GrammarElement`	Basic grammar symbols (terminal or non-terminal).
`StringCs`	Case-sensitive string (usually a terminal) symbol that can be a word or phrase.
`String`	Case-insensitive version of `StringCs`.
`SetCs`	Case-sensitive strings in which matching any will lead to parsing success.
`Set`	Case-insensitive version of `SetCs`.
`RegexCs`	Case-sensitive string matching with regular expressions.
`Regex`	Case-insensitive version of `RegexCs`.
`GrammarExpression`	An expression usually involving a binary combination of two `GrammarElement`‘s.
`And`	An “+” expression that requires matching a sequence.
`Or`	An “\|” expression that requires matching any one.
`GrammarElementEnhance`	Enhanced grammar symbols for `Optional`/`OneOrMore` etc.
`Optional`	Optional matching (0 or 1 time).
`OneOrMore`	OneOrMore matching (1 or more times).
`ZeroOrMore`	ZeroOrMore matching (0 or more times).
`NULL`	Null state, used internally
`GrammarImpl`	Actual grammar implementation that is returned by a `Grammar` construction.
`Production`	Abstract class for a grammar production in the form:
`ExpressionProduction`	Wrapper of `GrammarExpression`.
`ElementProduction`	Wrapper of `GrammarElement`.
`ElementEnhanceProduction`	Wrapper of `GrammarElementEnhance`.
`TreeNode`	A tree structure to represent parser output.
`Edge`	An edge in the chart with the following fields:
`Agenda`	An agenda for ordering edges that will enter the chart.
`ParseResult`	Parse result converted from `TreeNode` output, providing easy
`Chart`	A 2D chart (list) to store graph edges.
`IncrementalChart`	A 2D chart (list of list) that expands its size as having more edges added.
`ChartRule`	Rules applied in parsing, such as scan/predict/fundamental.
`TopDownInitRule`	Initialize the chart when we get started by inserting the goal.
`BottomUpScanRule`	Rules used in bottom up scanning.
`TopDownPredictRule`	Predict edge if it’s not complete and add it to chart
`LeftCornerPredictScanRule`	Left corner rules: only add productions whose left corner non-terminal can parse the lexicon.
`BottomUpPredictRule`	In bottom up parsing, predict edge if it’s not complete and add it to chart
`TopDownScanRule`	Scan lexicon from top down
`CompleteRule`	Complete an incomplete edge form the agenda by merging with a matching completed edge from the chart, or complete an incomplete edge from the chart by merging with a matching completed edge from the agenda.
`ParsingStrategy`	Parsing strategy used in TopDown, BottomUp, LeftCorner parsing.
`TopDownStrategy`	Parsing strategy used in TopDown, BottomUp, LeftCorner parsing.
`BottomUpStrategy`	Parsing strategy used in TopDown, BottomUp, LeftCorner parsing.
`LeftCornerStrategy`	Parsing strategy used in TopDown, BottomUp, LeftCorner parsing.
`RobustParser`	A robust, incremental chart parser.

Class API Details¶

parsetron.py, a semantic parser written in pure Python.

class parsetron.Agenda(*args, **kwargs)[source]¶

Bases: object

An agenda for ordering edges that will enter the chart. Current implementation is a wrapper around collections.deque.

collections.deque supports both FILO (collections.deque.pop()) and FIFO (collections.deque.popleft()). FILO functions like a stack: edges get immediately popped out after they are pushed in. This has merit of finishing the parse sooner, esp. when new edges are just completed, then we can pop them for prediction.

__weakref__¶: list of weak references to the object (if defined)

append(edge)[source]¶

Add a single edge to agenda. edge can be either complete or not to be appended to agenda.

Parameters:	edge (Edge) –

extend(edges)[source]¶

Add a sequence of edges to agenda.

Parameters:	edges –
Type:	list(:class`Edge`)

pop()[source]¶

Pop an edge from agenda (stack).

Returns:	an edge
Return type:	`Edge`

class parsetron.And(exprs)[source]¶

Bases: parsetron.GrammarExpression

An “+” expression that requires matching a sequence.

class parsetron.BottomUpPredictRule[source]¶

Bases: parsetron.ChartRule

In bottom up parsing, predict edge if it’s not complete and add it to chart

class parsetron.BottomUpScanRule[source]¶

Bases: parsetron.ChartRule

Rules used in bottom up scanning.

class parsetron.Chart(size)[source]¶

Bases: object

A 2D chart (list) to store graph edges. Edges can be accessed via: Chart.edges[start][end] and return value is a set of edges.

Parameters:	size (int) – chart size, normally `len(tokens) + 1`.

__weakref__¶: list of weak references to the object (if defined)

_most_compact_trees(parent_edge, tokens=None)[source]¶: Try to eliminate spurious ambiguities by getting the most compact/flat tree. This mainly deals with removing Optional/ZeroOrMore nodes

add_edge(edge, prev_edge, child_edge, lexicon=u'')[source]¶

Add edge to the chart with backpointers being previous edge and child edge

Parameters:	edge (Edge) – newly formed edge prev_edge (Edge) – the left (previous) edge where edge is coming from child_edge (Edge) – the right (child) edge that the completion of which moved the dot ot prev_edge
Return bool:	Whether this edge is newly inserted (not already exists)

best_tree_with_parse_result(trees)[source]¶

Return a tuple of the smallest tree among trees and its parse result.

Parameters:	trees (list) – a list of `TreeNode`
Returns:	a tuple of (best tree, its parse result)
Return type:	tuple(`TreeNode`, `ParseResult`)

filter_completed_edges(start, lhs)[source]¶

Find all edges with matching start position and LHS with lhs. directly after the dot as rhs_after_dot. For instance, both edges:

[1, 1] NP ->  * NNS CC NNS
[1, 3] NP ->  * NNS

match start=1 and lhs=NP.

Returns:	a list of edges
Return type:	list(`Edge`)

filter_edges_for_completion(end, rhs_after_dot)[source]¶

Find all edges with matching end position and RHS nonterminal directly after the dot as rhs_after_dot. For instance, both edges:

[1, 1] NNS ->  * NNS CC NNS
[1, 1] NP ->  * NNS

match end=1 and rhs_after_dot=NNS

filter_edges_for_prediction(end)[source]¶

Return a list of edges ending at end.

Parameters:	end (int) – end position
Returns:	list(`Edge`)

get_edge_lexical_span(edge)[source]¶

syntactic sugar for calling self.get_lexical_span(edge.start, edge.end)

Parameters:	edge (Edge) –
Returns:	(int, int)

get_lexical_span(start, end=None)[source]¶

Get the lexical span chart covers from start to end. For instance, for the following sentence:

please [turn off] the [lights]

with parsed items in [], then the lexical span will look like:

when end = None, function assumes end=start+1
if start = 0 and end = None, then return (1, 3)
if start = 1 and end = None, then return (4, 5)
if start = 0 and end = 1, then return (1, 5)

Parameters:	start (int) – end (int) –
Returns:	(int, int)

print_backpointers()[source]¶: Return a string representing the current state of all backpointers.

set_lexical_span(start, end, i=None)[source]¶

set the lexical span at i in chart to (start, end). if i is None, then default to the last slot in chart (self.chart_i-1). lexical span here means the span chart at i points to in the original sentence. For instance, for the following sentence:

please [turn off] the [lights]

with parsed items in [], then the lexical span will look like:

start = 1, end = 3, i = 0
start = 4, end = 5, i = 1

trees(tokens=None, all_trees=False, goal=None)[source]¶

Yield all possible trees this chart covers. If all_trees is False, then only the most compact trees for each goal are yielded. Otherwise yield all trees (warning: can be thousands).

Parameters:	tokens (list) – a list of lexicon tokens all_trees (bool) – if False, then only print the smallest tree. goal – the root of this tree (usually Grammar.GOAL)
Type:	GrammarElement, None
Returns:	a tuple of (tree index, TreeNode)
Return type:	tuple(int, `TreeNode`)

class parsetron.ChartRule[source]¶

Bases: object

Rules applied in parsing, such as scan/predict/fundamental. New rules need to implement the apply() method.

__weakref__¶: list of weak references to the object (if defined)

class parsetron.CompleteRule[source]¶

Bases: parsetron.ChartRule

Complete an incomplete edge form the agenda by merging with a matching completed edge from the chart, or complete an incomplete edge from the chart by merging with a matching completed edge from the agenda.

class parsetron.Edge(start, end, production, dot)[source]¶

Bases: object

An edge in the chart with the following fields:

Parameters:	start (int) – the starting position end (int) – the end position, so span = end - start production (Production) – the grammar production dot (int) – the dot position on the RHS. Any thing before the dot has been consumed and after is waiting to complete

get_rhs_after_dot()[source]¶

Returns the RHS symbol after dot. For instance, for edge:

[1, 1] NP ->  * NNS

it returns NNS.

If no symbol is after dot, then return None.

Returns:	RHS after dot
Return type:	`GrammarElement`

is_complete()[source]¶

Whether this edge is completed.

Return type:	bool

merge_and_forward_dot(edge)[source]¶

Move the dot of self forward by one position and change the end position of self edge to end position of edge. Then return a new merged Edge. For instance:

self: [1, 2] NNS ->  * NNS CC NNS
edge: [2, 3] NNS -> men *

Returns a new edge:

[1, 3] NNS -> NNS * CC NNS

Requires that edge.start == self.end

Returns:	a new edge
Return type:	`Edge`

scan_after_dot(phrase)[source]¶

Scan phrase with RHS after the dot. Returns a tuple of (lexical_progress, rhs_progress) in booleans.

Returns:	a tuple
Return type:	tuple(bool, bool)

span()[source]¶

The span this edge covers, alias of end - start. For instance, for edge:

[1, 3] NP ->  * NNS

it returns 2

Returns:	an int
Return type:	int

class parsetron.ElementEnhanceProduction(element, rhs=None)[source]¶

Bases: parsetron.ElementProduction

Wrapper of GrammarElementEnhance. An ElementEnhanceProduction has the following assertion true:

LHS == RHS[0]

class parsetron.ElementProduction(element)[source]¶

Bases: parsetron.Production

Wrapper of GrammarElement. An ElementProduction has the following assertion true:

LHS == RHS[0]

class parsetron.ExpressionProduction(lhs, rhs)[source]¶

Bases: parsetron.Production

Wrapper of GrammarExpression.

class parsetron.Grammar[source]¶

Bases: object

Grammar user interface. Users should inherit this grammar and define a final grammar GOAL as class variable.

It’s a wrapper around GrammarImpl but does not expose any internal functions. So users can freely define their grammar without worrying about name pollution. However, when a Grammar is constructed, a GrammarImpl is actually returned:

>>> g = Grammar()

now g is the real grammar (GrammarImpl)

Warning

Grammar elements have to be defined as class variables instead of instance variables for the Grammar object to extract variable names in string

Warning

Users have to define a GOAL variable in Grammar (similar to start variable S conventionally used in grammar definition)

__metaclass__¶: alias of MetaGrammar

__weakref__¶: list of weak references to the object (if defined)

static test()[source]¶: A method to be batch called by pytest (through test_grammars.py). Users should give examples of what this Grammar parses and use these examples for testing.

class parsetron.GrammarElement[source]¶

Bases: object

Basic grammar symbols (terminal or non-terminal).

Developers inheriting this class should implement the following functions:

_parse()
default_name()

A grammar element carries the following attributes:

is_terminal: whether this element is terminal or non-terminal. A general rule of thumb is:
- if it’s GrammarElement, then terminal;
- if it’s GrammarExpression, then non-terminal;
- if it’s GrammarElementEnhance, then non-terminal.
name: the name of this element, usually set by the the set_name() function or implicitly __call__() function.
variable_name: automatically extracted variable name in string through the Grammar construction.
canonical_name: if neither name nor variable_name is set, then a canonical name is assigned trying to be as expressive as possible.
as_list: whether saves result in a hierarchy as a list, or just flat
ignore: whether to be ignored in ParseResult

__add__(other)[source]¶: Implement the + operator. Returns And.

__call__(name)[source]¶: Shortcut for set_name()

__mul__(other)[source]¶

Implements the * operator, followed by an integer or a tuple/list:

e * m: m repetitions of e (m > 0)

e * (m, n) or e * [m, n]: m to n repetitions of e (all inclusive)

m or n in (m,n)/[m,n] can be None

for instance (=> stands for “is equivalent to”):

e * (m, None) or e * (m,) => m or more instances of e => e * m + ZeroOrMore (e)

e * (None, n) or e * (0, n) => 0 to n instances of e

e * (None, None) => ZeroOrMore (e)

e * (1, None) => OneOrMore (e)

e * (None, 1) => Optional (e)

__or__(other)[source]¶: Implement the | operator. Returns Or.

__radd__(other)[source]¶: Implement the + operator. Returns And.

__ror__(other)[source]¶: Implement the | operator. Returns Or.

__weakref__¶: list of weak references to the object (if defined)

_parse(instring)[source]¶

Main parsing method to be implemented by developers. Raises ParseException when there is no parse.

Parameters:	instring (str) – input string to be parsed
Returns:	True if full parse else False
Return type:	bool
Raises:	`ParseException`

default_name()[source]¶

default canonical name.

Returns:	a string
Return type:	str

ignore()[source]¶: Call this function to make this grammar element not appear in parse result. :return: self

parse(instring)[source]¶

Main parsing method to be called by users. Raises ParseException when there is no parse. Returns True if the whole string is parsed and False if input string is not parsed but no exception is thrown either (e.g., parsing with Null element)

Parameters:	instring (str) – input string
Return bool:	True if the whole string is parsed else False

production()[source]¶: converts this GrammarElement (used by User) to a GrammarProduction (used by Parser)

replace_result_with(value)[source]¶

replace the result lexicon with value. This is a shortcut to:

self.set_result_action(lambda r: r.set(value))

Parameters:	value – any object
Returns:	self

run_post_funcs(result)[source]¶

Run functions set by set_result_action() after getting parsing result.

Parameters:	result (ParseResult) – parsing result
Returns:	None

set_name(name)[source]¶

Set the name of a grammar symbol. Usually the name of a GrammarElement is set by its variable name, for instance:

>>> light = String("light")

but in on-the-fly construction, one can call set_name():

>>> Optional(light).set_name("optional_light")

or shorten it to a function call like name setting:

>>> Optional(light)("optional_light")

The function returns a new shallow copied GrammarElement object. This allows reuse of common grammar elements in complex grammars without name collision.

Parameters:	name (str) – name of this grammar symbol
Returns:	a self copy (with different id and hash)
Return type:	`GrammarElement`

set_result_action(*functions)[source]¶

Set functions to call after parsing. For instance:

>>> number = Regex(r"\d+").set_result_action(lambda x: int(x))

It can be a list of functions too:

>>> def f1(): pass  # do something
>>> def f2(): pass  # do something
>>> number = Regex(r"\d+").set_result_action(f1, f2)

Parameters:	functions – a list of functions
Returns:	self

yield_productions()[source]¶: Yield how this element/expression produces grammar productions

class parsetron.GrammarElementEnhance(expr)[source]¶

Bases: parsetron.GrammarElement

Enhanced grammar symbols for Optional/OneOrMore etc.

yield_productions()[source]¶: Yield how this expression produces grammar productions. A GrammarElementEnhance class should implement its own.

exception parsetron.GrammarException[source]¶

Bases: exceptions.Exception

Exception thrown when we can’t construct the grammar.

__weakref__¶: list of weak references to the object (if defined)

class parsetron.GrammarExpression(exprs)[source]¶

Bases: parsetron.GrammarElement

An expression usually involving a binary combination of two GrammarElement‘s. The resulting GrammarExpression is a non-terminal and does not implement the parsing function _parse().

yield_productions()[source]¶: Yield how this expression produces grammar productions. A GrammarExpression class should implement its own.

class parsetron.GrammarImpl(name, dct)[source]¶

Bases: object

Actual grammar implementation that is returned by a Grammar construction.

__init__(name, dct)[source]¶

This __init__() function should only be called from MetaGrammar but never explicitly.

Parameters:	name (str) – name of this grammar class dct (dict) – __class__.__dict__ field

__weakref__¶: list of weak references to the object (if defined)

_build_grammar_recursively(element, productions)[source]¶

Build a grammar from element. This mainly includes recursively extracting AND/OR GrammarExpression‘s from element.

Parameters:	element (GrammarExpression) – a `GrammarExpression` productions (set) – assign to set() when calling the first time
Returns:	a set of `Production`
Return type:	set(`Production`)

_eliminate_null_and_expand()[source]¶

Eliminate the Null elements in grammar by introducing more productions without Null elements. For each production with Null, add a new production without. For instance:

S => Optional(A) B Optional(C)
Optional(A) => NULL    --> remove
Optional(A) => A
Optional(C) => NULL
Optional(C) => C       --> remove

becomes:

S => Optional(A) B Optional(C)
Optional(A) => A
Optional(C) => C
S => B Optional(C)     --> added
S => Optional(A) B     --> added
S => B                 --> added

The rational behind this is that NULL elements call for a lot of extra computation and are highly ambiguous. This function increases the size of grammar but helps gain extra parsing speed. In reality comparison of a parsing task:

without eliminating: 1.6s, _fundamental_rule() was called 38K times, taking 50% of all computing time. 2c52b18d5fcfb901b55ff0506d75c3f41073871c

with eliminating: 0.6s, _fundamental_rule() was called 23K times, taking 36% of all computing time. 33a1f3f541657ddf0204d02338d94a7e89473d86

_extract_var_names(dct)[source]¶

Given a dictionary, extract all variable names. For instance, given:

light_general_name = Regex(r"(lights|light|lamp)")

extract the mapping from id(light_general_name) to “light_general_name”

Parameters:	dct (dict) – a grammar dictionary
Returns:	a dictionary mapping from `id(variable)` to variable name.

_set_element_canonical_name(element)[source]¶

Set the canonical name field of element, if not set yet

Parameters:	element (GrammarElement) – a grammar element
Returns:	None

_set_element_variable_name(element)[source]¶

Set the variable_name field of element

Parameters:	element (GrammarElement) – a grammar element
Return bool:	True if found else False

build_leftcorner_table()[source]¶

For each grammar production, build two mappings from the production to:

its left corner RHS element (which is a pre-terminal);

the terminal element that does the actual parsing job.

filter_productions_for_prediction_by_lhs(lhs)[source]¶

Yield all productions whose LHS is lhs.

Parameters:	lhs (GrammarElement) – a grammar element
Returns:	a production generator
Return type:	generator(`Production`)

filter_productions_for_prediction_by_rhs(rhs_starts_with)[source]¶

Yield all productions whose RHS[0] is rhs_starts_with.

Parameters:	rhs_starts_with (GrammarElement) – a grammar element
Returns:	a production generator
Return type:	generator(`Production`)

filter_terminals_for_scan(lexicon)[source]¶

Yield all terminal productions that parses lexicon.

Parameters:	lexicon (str) – a string to be parsed
Returns:	a production generator
Return type:	generator(`Production`)

get_left_corner_nonterminals(prod)[source]¶

Given a grammar production, return a set with its left-corner non-terminal productions, or a set with prod itself if not found, .e.g.:

S => A B
A => C D
A => e f
B => b
C => c

passing S as prod will return the set of productions for A and C.

Parameters:	prod (Production) – a grammar production
Returns:	set(`Production`)

get_left_corner_terminals(prod)[source]¶

Given a grammar production, return a set with its left-corner terminal productions, or an empty set if not found, .e.g.:

S => A B
A => C D
A => e f
B => b
C => c

passing S as prod will return the set of productions for e and c.

Parameters:	prod (Production) – a grammar production
Returns:	set(`Production`)

class parsetron.IncrementalChart(init_size=10, inc_size=10)[source]¶

Bases: parsetron.Chart

A 2D chart (list of list) that expands its size as having more edges added.

Parameters:	size (int) – current size of chart max_size (int) – total capacity of chart, if exceeded, then need to increase by `inc_size`. inc_size (int) – size to increase when max_size is filled

__init__(init_size=10, inc_size=10)[source]¶

Parameters:	init_size – the initial size inc_size – extra size to span when the chart is filled up

increase_capacity()[source]¶: Increase the capacity of the current chart by self.inc_size

class parsetron.LeftCornerPredictScanRule[source]¶

Bases: parsetron.ChartRule

Left corner rules: only add productions whose left corner non-terminal can parse the lexicon.

class parsetron.MetaGrammar[source]¶

Bases: type

A meta grammar used to extract symbol names (expressed as variables) during grammar construction time. This provides a cleaner way than using obj.__class__.__dict__, whose __dict__ has to be accessed via an extra and explicit function call.

class parsetron.Null[source]¶

Bases: parsetron.GrammarElement

Null state, used internally

_parse(instring)[source]¶

Always returns False, no exceptions.

Parameters:	instring (str) – input string to be parsed
Returns:	False

class parsetron.OneOrMore(expr)[source]¶

Bases: parsetron.GrammarElementEnhance

OneOrMore matching (1 or more times).

yield_productions()[source]¶

Yield how this expression produces grammar productions. If A = OneOrMore(B), then this yields:

A => B
A => B A

class parsetron.Optional(expr)[source]¶

Bases: parsetron.GrammarElementEnhance

Optional matching (0 or 1 time).

yield_productions()[source]¶

Yield how this expression produces grammar productions. If A = Optional(B), then this yields:

A => NULL
A => B

class parsetron.Or(exprs)[source]¶

Bases: parsetron.GrammarExpression

An “|” expression that requires matching any one.

exception parsetron.ParseException[source]¶

Bases: exceptions.Exception

Exception thrown when we can’t parse the whole string.

__weakref__¶: list of weak references to the object (if defined)

class parsetron.ParseResult(name, lexicon, as_flat=True, lex_span=(None, None))[source]¶

Bases: object

Parse result converted from TreeNode output, providing easy access by list or attribute style, for instance:

result['color']
result.color

Results are flattened as much as possible, meaning: deep children are elevated to the top as much as possible as long as there are no name conflicts. For instance, given the following parse tree:

(GOAL
  (And(action_verb, OneOrMore(one_parse))
    (action_verb "flash")
    (OneOrMore(one_parse)
      (one_parse
        (light_name
          (Optional(light_quantifiers)
            (light_quantifiers "both")
          )
          (ZeroOrMore(light_specific_name)
            (light_specific_name "top")
            (light_specific_name "bottom")
          )
          (Optional(light_general_name)
            (light_general_name "light")
          )
        )
        (ZeroOrMore(color)
          (color "red")
        )
      )
      (one_parse
        (light_name
          (ZeroOrMore(light_specific_name)
            (light_specific_name "middle")
          )
          (Optional(light_general_name)
            (light_general_name "light")
          )
        )
        (ZeroOrMore(color)
          (color "green")
        )
      )
      (one_parse
        (light_name
          (ZeroOrMore(light_specific_name)
            (light_specific_name "bottom")
          )
        )
        (ZeroOrMore(color)
          (color "purple")
        )
      )
    )
  )
)

The parse result looks like:

{
  "action_verb": "flash",
  "one_parse": [
    {
      "one_parse": "both top bottom light red",
      "light_name": "both top bottom light",
      "light_quantifiers": "both",
      "ZeroOrMore(color)": "red",
      "color": "red",
      "ZeroOrMore(light_specific_name)": "top bottom",
      "Optional(light_general_name)": "light",
      "light_general_name": "light",
      "Optional(light_quantifiers)": "both",
      "light_specific_name": [
        "top",
        "bottom"
      ]
    },
    {
      "one_parse": "middle light green",
      "light_name": "middle light",
      "ZeroOrMore(color)": "green",
      "color": "green",
      "ZeroOrMore(light_specific_name)": "middle",
      "Optional(light_general_name)": "light",
      "light_general_name": "light",
      "light_specific_name": "middle"
    },
    {
      "one_parse": "bottom purple",
      "light_name": "bottom",
      "ZeroOrMore(color)": "purple",
      "color": "purple",
      "ZeroOrMore(light_specific_name)": "bottom",
      "light_specific_name": "bottom"
    }
  ],
  "And(action_verb, OneOrMore(one_parse))": "flash both top bottom
  light red middle light green bottom purple",
  "GOAL": "flash both top bottom light red middle light green bottom
  purple",
  "OneOrMore(one_parse)": "both top bottom light red middle light green
  bottom purple"
}

The following holds true given the above result:

assert result.action_verb == "flash"
assert result['action_verb'] == "flash"
assert type(result.one_parse) is list
assert result.one_parse[0].color == 'red'
assert result.one_parse[0].light_specific_name == ['top', 'bottom']
assert result.one_parse[1].light_specific_name == 'middle'

Note how the parse result is flattened w.r.t. the tree. Basic principles of flattening are:

value of result access is either a string or another ParseResult object
If a node has >= 1 children with the same name, make the name hold a list
Else make the name hold a string value.

__weakref__¶: list of weak references to the object (if defined)

add_item(k, v)[source]¶: Add a k => v pair to result

add_result(result, as_flat)[source]¶

Add another result to the current result.

Parameters:	result (ParseResult) – another result as_flat (bool) – whether to flatten result.

get(item=None, default=None)[source]¶: Get the value of item, if not found, return default. If item is not set, then get the main value of ParseResult. The usual value is a lexicon string. But it can be different if the ParseResult.set() function is called.

items()[source]¶: Return the dictionary of items in result

keys()[source]¶: Return the set of names in result

lex_span(name=None)[source]¶

Return the lexical span of this result (if name=None) or the result name. For instance:

_, result = parser.parse('blink lights')
assert result.lex_span() == (1,3)
assert result.lex_span("action") == (1,2)

Parameters:	name (str) – when None, return self span, otherwise, return child result’s span
Returns:	(int, int)

name()[source]¶

Return the result name

Returns:	a string
Return type:	str

names()[source]¶: Return the set of names in result

set(value)[source]¶: Set the value of ParseResult. value is not necessarily a string though: post functions from GrammarElement.set_result_action() can pass a different value to value.

values()[source]¶: Return the set of values in result

class parsetron.ParsingStrategy(rule_list)[source]¶

Bases: object

Parsing strategy used in TopDown, BottomUp, LeftCorner parsing. Each strategy consists of a list of various ChartRules‘s.

__weakref__¶: list of weak references to the object (if defined)

class parsetron.Production(lhs, rhs)[source]¶

Bases: object

Abstract class for a grammar production in the form:

LHS –> RHS (RHS is a list)

A grammar production is used by the parser while a grammar element by the user.

Parameters:	lhs – a single LHS of `GrammarElement` rhs (list) – a list of RHS element, each of which is of `GrammarElement`

__weakref__¶: list of weak references to the object (if defined)

static factory(lhs, rhs=None)[source]¶

a Production factory that constructs new productions according to the type of lhs. Users can either call this function, or directly call Production constructors.

Parameters:	lhs – a single LHS of `GrammarElement` rhs (list, `GrammarElement`) – RHS elements (single or a list), each of which is of `GrammarElement`

class parsetron.Regex(pattern, flags=2, match_whole=True)[source]¶

Bases: parsetron.RegexCs

Case-insensitive version of RegexCs.

class parsetron.RegexCs(pattern, flags=0, match_whole=True)[source]¶

Bases: parsetron.GrammarElement

Case-sensitive string matching with regular expressions. e.g.:

>>> color = RegexCs(r"(red|blue|orange)")
>>> digits = RegexCs(r"\d+")

Or pass a compile regex:

>>> import re
>>> color = RegexCs(re.compile(r"(red|blue|orange|a long list)"))

Parameters:	flags (int) – standard `re` flags match_whole (bool) – whether matching the whole string (default: True).

Warning

if match_whole=False, then r"(week|weeks)" will throw a ParseException when parsing “weeks”, but r"(weeks|week)" will succeed to parse “weeks”

RegexType¶: alias of SRE_Pattern

class parsetron.RobustParser(grammar, strategy=<parsetron.ParsingStrategy object>)[source]¶

Bases: object

A robust, incremental chart parser.

Parameters:	grammar – user defined grammar, a `GrammarImpl` type. strategy (ParsingStrategy) – top-down or bottom-up parsing

__weakref__¶: list of weak references to the object (if defined)

_parse_multi_token(sent_or_tokens, chart=None, lex_start=None)[source]¶

Parse sentences while being able to tokenize multiple tokens, for instance:

kill lights -> “kill” “lights” turn off lights -> “turn off” “lights”

Each quotes-enclosed (multi-)token is recognized as a phrase.

This function doesn’t parse unrecognizable tokens.

clear_cache()[source]¶: Clear all history when the parser is to parse another sentence. Mainly used in server mode for incremental parsing

incremental_parse(single_token, is_final, only_goal=True, is_first=False)[source]¶

Incremental parsing one token each time. Returns the best parsing tree and parse result.

Parameters:	single_token (str) – a single word is_final (bool) – whether the current single_token is the last one in sentence. only_goal (bool) – only output trees with GOAL as root node is_first (bool) – whether single_token is the first token
Returns:	(best parse tree, parse result)
Return type:	tuple(`TreeNode`, `ParseResult`) or (None, None)

incremental_parse_to_chart(single_token, chart)[source]¶

Incremental parsing one token each time. Returns (chart, is_token_accepted).

Parameters:	single_token (str) – a single word chart (RobustChart) – the previous returned chart. On first call, set it to None.
Returns:	a tuple of (chart, parsed_tokens)

parse(string)[source]¶

Parse an input sentence in string and return the best (tree, result).

Parameters:	string – tokenized input
Returns:	(best tree, best parse)
Return type:	tuple(`TreeNode`, `ParseResult`)

parse_multi_token_skip_reuse_chart(sent)[source]¶

Parse sentence with capabilities to:

multi_token: recognize multiple tokens as one phrase

(e.g., “turn off”)

skip: throw away tokens not licensed by grammar (e.g.,

speech fillers “um...”)

reuse_chart: doesn’t waste computation by reusing the chart

from last time. This makes the function call up to 25% faster than without reusing the chart.

Parameters:	sent (str) – a sentence in string
Returns:	the chart and the newly parsed tokens
Return type:	tuple(`Chart`, list(str))

parse_string(string)[source]¶: alias of parse().

parse_to_chart(string)[source]¶

Parse a whole sentence into a chart and parsed tokens. This gives a raw chart where all trees or the single best tree can be drawn from.

Parameters:	string (str) – input sentence that’s already tokenized.
Returns:	parsing chart and newly parsed tokens
Return type:	tuple(`Chart`, list(str))

print_parse(string, all_trees=False, only_goal=True, best_parse=True, print_json=False, strict_match=False)[source]¶

Print the parse tree given input string.

Parameters:	string (str) – input string all_trees (bool) – whether to print all trees (warning: massive output) only_goal (bool) – only print the tree licensed by final goal best_parse (bool) – print the best one tree ranked by the smallest size strict_match (bool) – strictly matching input with parse output (for test purposes)
Returns:	True if there is a parse else False

class parsetron.Set(strings)[source]¶

Bases: parsetron.SetCs

Case-insensitive version of SetCs.

class parsetron.SetCs(strings, caseless=False)[source]¶

Bases: parsetron.GrammarElement

Case-sensitive strings in which matching any will lead to parsing success. This is a short cut for disjunction of StringCs s (|), or Regex (r'(a\|b\|c\|...)').

Input can be one of the following forms:

a string with elements separated by spaces (defined by regex r"\s+")
otherwise an iterable

For instance, the following input is equivalent:

>>> "aa bb cc"
>>> ["aa", "bb", "cc"]
>>> ("aa", "bb", "cc")
>>> {"aa", "bb", "cc"}

The following is also equivalent:

>>> "0123...9"
>>> "0 1 2 3 .. 9"
>>> ["0", "1", "2", "3", ..., "9"]
>>> ("0", "1", "2", "3", ..., "9")
>>> {"0", "1", "2", "3", ..., "9"}

class parsetron.String(string)[source]¶

Bases: parsetron.StringCs

Case-insensitive version of StringCs.

class parsetron.StringCs(string)[source]¶

Bases: parsetron.GrammarElement

Case-sensitive string (usually a terminal) symbol that can be a word or phrase.

class parsetron.TopDownInitRule[source]¶

Bases: parsetron.ChartRule

Initialize the chart when we get started by inserting the goal.

class parsetron.TopDownPredictRule[source]¶

Bases: parsetron.ChartRule

Predict edge if it’s not complete and add it to chart

class parsetron.TopDownScanRule[source]¶

Bases: parsetron.ChartRule

Scan lexicon from top down

class parsetron.TreeNode(parent, children, lexicon, lex_span)[source]¶

Bases: object

A tree structure to represent parser output. parent should be a chart Edge while children should be a TreeNode. lexicon is the matched string if this node is a leaf node.

Parameters:	parent (Edge) – an edge in Chart children (list) – a list of `TreeNode` lexicon (str) – matched string when this node is a leaf node. lex_span ((int,int)) – (start, end) of lexical token offset.

__weakref__¶: list of weak references to the object (if defined)

dict_for_js()[source]¶

represents this tree in dict so a json format can be extracted by:

json.dumps(node.dict_for_js())

Returns:	a `dict`

size()[source]¶

size is the total number of non-terminals and terminals in the tree

Returns:	int
Return type:	int

to_parse_result()[source]¶

Convert this TreeNode to a ParseResult. The result is flattened as much as possible following:

if the parent node has as_list=True (ZeroOrMore and OneOrMore), then its children are not flattened;
children are flattened (meaning: they are elevated to the same level as their parents) in the following cases:
- child is a leaf node
- parent has as_list=False and all children have no name conflicts (e.g., in p -> {c1 -> {n -> "lexicon1"}, c2 -> {n -> "lexicon2"}}, n will be elevated to the same levels of c1 and c2 separately, but not to the same level of p).

Returns:	`ParseResult`

class parsetron.ZeroOrMore(expr)[source]¶

Bases: parsetron.GrammarElementEnhance

ZeroOrMore matching (0 or more times).

yield_productions()[source]¶

Yield how this expression produces grammar productions. If A = ZeroOrMore(B), then this yields:

A => NULL
A => B
A => B A

or (semantically equivalent):

A => NULL
A => OneOrMore(B)

parsetron._ustr(obj)[source]¶: Drop-in replacement for str(obj) that tries to be Unicode friendly. It first tries str(obj). If that fails with a UnicodeEncodeError, then it tries unicode(obj). It then < returns the unicode object | encodes it with the default encoding | ... >.

parsetron.find_word_boundaries(string)[source]¶

Given a string, such as “my lights are off”, return a tuple:

0: a list containing all word boundaries in tuples
(start(inclusive), end(exclusive)):
[(0, 2), (3, 9), (10, 13), (14, 17)]
1: a set of all start positions: set[(0, 3, 10, 14)]
2: a set of all end positions: set[(2, 9, 13, 17)]

parsetron.strip_string(string)[source]¶: Merge spaces into single space