946 lines
20 KiB
ReStructuredText
946 lines
20 KiB
ReStructuredText
|
||
∂RE
|
||
===
|
||
|
||
Brzozowski's Derivatives of Regular Expressions
|
||
-----------------------------------------------
|
||
|
||
Legend:
|
||
|
||
::
|
||
|
||
∧ intersection
|
||
∨ union
|
||
∘ concatenation (see below)
|
||
¬ complement
|
||
ϕ empty set (aka ∅)
|
||
λ singleton set containing just the empty string
|
||
I set of all letters in alphabet
|
||
|
||
Derivative of a set ``R`` of strings and a string ``a``:
|
||
|
||
::
|
||
|
||
∂a(R)
|
||
|
||
∂a(a) → λ
|
||
∂a(λ) → ϕ
|
||
∂a(ϕ) → ϕ
|
||
∂a(¬a) → ϕ
|
||
∂a(R*) → ∂a(R)∘R*
|
||
∂a(¬R) → ¬∂a(R)
|
||
∂a(R∘S) → ∂a(R)∘S ∨ δ(R)∘∂a(S)
|
||
∂a(R ∧ S) → ∂a(R) ∧ ∂a(S)
|
||
∂a(R ∨ S) → ∂a(R) ∨ ∂a(S)
|
||
|
||
∂ab(R) = ∂b(∂a(R))
|
||
|
||
Auxiliary predicate function ``δ`` (I call it ``nully``) returns either
|
||
``λ`` if ``λ ⊆ R`` or ``ϕ`` otherwise:
|
||
|
||
::
|
||
|
||
δ(a) → ϕ
|
||
δ(λ) → λ
|
||
δ(ϕ) → ϕ
|
||
δ(R*) → λ
|
||
δ(¬R) δ(R)≟ϕ → λ
|
||
δ(¬R) δ(R)≟λ → ϕ
|
||
δ(R∘S) → δ(R) ∧ δ(S)
|
||
δ(R ∧ S) → δ(R) ∧ δ(S)
|
||
δ(R ∨ S) → δ(R) ∨ δ(S)
|
||
|
||
Some rules we will use later for "compaction":
|
||
|
||
::
|
||
|
||
R ∧ ϕ = ϕ ∧ R = ϕ
|
||
|
||
R ∧ I = I ∧ R = R
|
||
|
||
R ∨ ϕ = ϕ ∨ R = R
|
||
|
||
R ∨ I = I ∨ R = I
|
||
|
||
R∘ϕ = ϕ∘R = ϕ
|
||
|
||
R∘λ = λ∘R = R
|
||
|
||
Concatination of sets: for two sets A and B the set A∘B is defined as:
|
||
|
||
{a∘b for a in A for b in B}
|
||
|
||
E.g.:
|
||
|
||
{'a', 'b'}∘{'c', 'd'} → {'ac', 'ad', 'bc', 'bd'}
|
||
|
||
Implementation
|
||
--------------
|
||
|
||
.. code:: ipython2
|
||
|
||
from functools import partial as curry
|
||
from itertools import product
|
||
|
||
``ϕ`` and ``λ``
|
||
~~~~~~~~~~~~~~~
|
||
|
||
The empty set and the set of just the empty string.
|
||
|
||
.. code:: ipython2
|
||
|
||
phi = frozenset() # ϕ
|
||
y = frozenset({''}) # λ
|
||
|
||
Two-letter Alphabet
|
||
~~~~~~~~~~~~~~~~~~~
|
||
|
||
I'm only going to use two symbols (at first) becaase this is enough to
|
||
illustrate the algorithm and because you can represent any other
|
||
alphabet with two symbols (if you had to.)
|
||
|
||
I chose the names ``O`` and ``l`` (uppercase "o" and lowercase "L") to
|
||
look like ``0`` and ``1`` (zero and one) respectively.
|
||
|
||
.. code:: ipython2
|
||
|
||
syms = O, l = frozenset({'0'}), frozenset({'1'})
|
||
|
||
Representing Regular Expressions
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
To represent REs in Python I'm going to use tagged tuples. A *regular
|
||
expression* is one of:
|
||
|
||
::
|
||
|
||
O
|
||
l
|
||
(KSTAR, R)
|
||
(NOT, R)
|
||
(AND, R, S)
|
||
(CONS, R, S)
|
||
(OR, R, S)
|
||
|
||
Where ``R`` and ``S`` stand for *regular expressions*.
|
||
|
||
.. code:: ipython2
|
||
|
||
AND, CONS, KSTAR, NOT, OR = 'and cons * not or'.split() # Tags are just strings.
|
||
|
||
Because they are formed of ``frozenset``, ``tuple`` and ``str`` objects
|
||
only, these datastructures are immutable.
|
||
|
||
String Representation of RE Datastructures
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
.. code:: ipython2
|
||
|
||
def stringy(re):
|
||
'''
|
||
Return a nice string repr for a regular expression datastructure.
|
||
'''
|
||
if re == I: return '.'
|
||
if re in syms: return next(iter(re))
|
||
if re == y: return '^'
|
||
if re == phi: return 'X'
|
||
|
||
assert isinstance(re, tuple), repr(re)
|
||
tag = re[0]
|
||
|
||
if tag == KSTAR:
|
||
body = stringy(re[1])
|
||
if not body: return body
|
||
if len(body) > 1: return '(' + body + ")*"
|
||
return body + '*'
|
||
|
||
if tag == NOT:
|
||
body = stringy(re[1])
|
||
if not body: return body
|
||
if len(body) > 1: return '(' + body + ")'"
|
||
return body + "'"
|
||
|
||
r, s = stringy(re[1]), stringy(re[2])
|
||
if tag == CONS: return r + s
|
||
if tag == OR: return '%s | %s' % (r, s)
|
||
if tag == AND: return '(%s) & (%s)' % (r, s)
|
||
|
||
raise ValueError
|
||
|
||
``I``
|
||
~~~~~
|
||
|
||
Match anything. Often spelled "."
|
||
|
||
::
|
||
|
||
I = (0|1)*
|
||
|
||
.. code:: ipython2
|
||
|
||
I = (KSTAR, (OR, O, l))
|
||
|
||
.. code:: ipython2
|
||
|
||
print stringy(I)
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
.
|
||
|
||
|
||
``(.111.) & (.01 + 11*)'``
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The example expression from Brzozowski:
|
||
|
||
::
|
||
|
||
(.111.) & (.01 + 11*)'
|
||
a & (b + c)'
|
||
|
||
Note that it contains one of everything.
|
||
|
||
.. code:: ipython2
|
||
|
||
a = (CONS, I, (CONS, l, (CONS, l, (CONS, l, I))))
|
||
b = (CONS, I, (CONS, O, l))
|
||
c = (CONS, l, (KSTAR, l))
|
||
it = (AND, a, (NOT, (OR, b, c)))
|
||
|
||
.. code:: ipython2
|
||
|
||
print stringy(it)
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
(.111.) & ((.01 | 11*)')
|
||
|
||
|
||
``nully()``
|
||
~~~~~~~~~~~
|
||
|
||
Let's get that auxiliary predicate function ``δ`` out of the way.
|
||
|
||
.. code:: ipython2
|
||
|
||
def nully(R):
|
||
'''
|
||
δ - Return λ if λ ⊆ R otherwise ϕ.
|
||
'''
|
||
|
||
# δ(a) → ϕ
|
||
# δ(ϕ) → ϕ
|
||
if R in syms or R == phi:
|
||
return phi
|
||
|
||
# δ(λ) → λ
|
||
if R == y:
|
||
return y
|
||
|
||
tag = R[0]
|
||
|
||
# δ(R*) → λ
|
||
if tag == KSTAR:
|
||
return y
|
||
|
||
# δ(¬R) δ(R)≟ϕ → λ
|
||
# δ(¬R) δ(R)≟λ → ϕ
|
||
if tag == NOT:
|
||
return phi if nully(R[1]) else y
|
||
|
||
# δ(R∘S) → δ(R) ∧ δ(S)
|
||
# δ(R ∧ S) → δ(R) ∧ δ(S)
|
||
# δ(R ∨ S) → δ(R) ∨ δ(S)
|
||
r, s = nully(R[1]), nully(R[2])
|
||
return r & s if tag in {AND, CONS} else r | s
|
||
|
||
No "Compaction"
|
||
~~~~~~~~~~~~~~~
|
||
|
||
This is the straightforward version with no "compaction". It works fine,
|
||
but does waaaay too much work because the expressions grow each
|
||
derivation.
|
||
|
||
.. code:: ipython2
|
||
|
||
def D(symbol):
|
||
|
||
def derv(R):
|
||
|
||
# ∂a(a) → λ
|
||
if R == {symbol}:
|
||
return y
|
||
|
||
# ∂a(λ) → ϕ
|
||
# ∂a(ϕ) → ϕ
|
||
# ∂a(¬a) → ϕ
|
||
if R == y or R == phi or R in syms:
|
||
return phi
|
||
|
||
tag = R[0]
|
||
|
||
# ∂a(R*) → ∂a(R)∘R*
|
||
if tag == KSTAR:
|
||
return (CONS, derv(R[1]), R)
|
||
|
||
# ∂a(¬R) → ¬∂a(R)
|
||
if tag == NOT:
|
||
return (NOT, derv(R[1]))
|
||
|
||
r, s = R[1:]
|
||
|
||
# ∂a(R∘S) → ∂a(R)∘S ∨ δ(R)∘∂a(S)
|
||
if tag == CONS:
|
||
A = (CONS, derv(r), s) # A = ∂a(R)∘S
|
||
# A ∨ δ(R) ∘ ∂a(S)
|
||
# A ∨ λ ∘ ∂a(S) → A ∨ ∂a(S)
|
||
# A ∨ ϕ ∘ ∂a(S) → A ∨ ϕ → A
|
||
return (OR, A, derv(s)) if nully(r) else A
|
||
|
||
# ∂a(R ∧ S) → ∂a(R) ∧ ∂a(S)
|
||
# ∂a(R ∨ S) → ∂a(R) ∨ ∂a(S)
|
||
return (tag, derv(r), derv(s))
|
||
|
||
return derv
|
||
|
||
Compaction Rules
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
.. code:: ipython2
|
||
|
||
def _compaction_rule(relation, one, zero, a, b):
|
||
return (
|
||
b if a == one else # R*1 = 1*R = R
|
||
a if b == one else
|
||
zero if a == zero or b == zero else # R*0 = 0*R = 0
|
||
(relation, a, b)
|
||
)
|
||
|
||
An elegant symmetry.
|
||
|
||
.. code:: ipython2
|
||
|
||
# R ∧ I = I ∧ R = R
|
||
# R ∧ ϕ = ϕ ∧ R = ϕ
|
||
_and = curry(_compaction_rule, AND, I, phi)
|
||
|
||
# R ∨ ϕ = ϕ ∨ R = R
|
||
# R ∨ I = I ∨ R = I
|
||
_or = curry(_compaction_rule, OR, phi, I)
|
||
|
||
# R∘λ = λ∘R = R
|
||
# R∘ϕ = ϕ∘R = ϕ
|
||
_cons = curry(_compaction_rule, CONS, y, phi)
|
||
|
||
Memoizing
|
||
~~~~~~~~~
|
||
|
||
We can save re-processing by remembering results we have already
|
||
computed. RE datastructures are immutable and the ``derv()`` functions
|
||
are *pure* so this is fine.
|
||
|
||
.. code:: ipython2
|
||
|
||
class Memo(object):
|
||
|
||
def __init__(self, f):
|
||
self.f = f
|
||
self.calls = self.hits = 0
|
||
self.mem = {}
|
||
|
||
def __call__(self, key):
|
||
self.calls += 1
|
||
try:
|
||
result = self.mem[key]
|
||
self.hits += 1
|
||
except KeyError:
|
||
result = self.mem[key] = self.f(key)
|
||
return result
|
||
|
||
With "Compaction"
|
||
~~~~~~~~~~~~~~~~~
|
||
|
||
This version uses the rules above to perform compaction. It keeps the
|
||
expressions from growing too large.
|
||
|
||
.. code:: ipython2
|
||
|
||
def D_compaction(symbol):
|
||
|
||
@Memo
|
||
def derv(R):
|
||
|
||
# ∂a(a) → λ
|
||
if R == {symbol}:
|
||
return y
|
||
|
||
# ∂a(λ) → ϕ
|
||
# ∂a(ϕ) → ϕ
|
||
# ∂a(¬a) → ϕ
|
||
if R == y or R == phi or R in syms:
|
||
return phi
|
||
|
||
tag = R[0]
|
||
|
||
# ∂a(R*) → ∂a(R)∘R*
|
||
if tag == KSTAR:
|
||
return _cons(derv(R[1]), R)
|
||
|
||
# ∂a(¬R) → ¬∂a(R)
|
||
if tag == NOT:
|
||
return (NOT, derv(R[1]))
|
||
|
||
r, s = R[1:]
|
||
|
||
# ∂a(R∘S) → ∂a(R)∘S ∨ δ(R)∘∂a(S)
|
||
if tag == CONS:
|
||
A = _cons(derv(r), s) # A = ∂a(r)∘s
|
||
# A ∨ δ(R) ∘ ∂a(S)
|
||
# A ∨ λ ∘ ∂a(S) → A ∨ ∂a(S)
|
||
# A ∨ ϕ ∘ ∂a(S) → A ∨ ϕ → A
|
||
return _or(A, derv(s)) if nully(r) else A
|
||
|
||
# ∂a(R ∧ S) → ∂a(R) ∧ ∂a(S)
|
||
# ∂a(R ∨ S) → ∂a(R) ∨ ∂a(S)
|
||
dr, ds = derv(r), derv(s)
|
||
return _and(dr, ds) if tag == AND else _or(dr, ds)
|
||
|
||
return derv
|
||
|
||
Let's try it out...
|
||
-------------------
|
||
|
||
(FIXME: redo.)
|
||
|
||
.. code:: ipython2
|
||
|
||
o, z = D_compaction('0'), D_compaction('1')
|
||
REs = set()
|
||
N = 5
|
||
names = list(product(*(N * [(0, 1)])))
|
||
dervs = list(product(*(N * [(o, z)])))
|
||
for name, ds in zip(names, dervs):
|
||
R = it
|
||
ds = list(ds)
|
||
while ds:
|
||
R = ds.pop()(R)
|
||
if R == phi or R == I:
|
||
break
|
||
REs.add(R)
|
||
|
||
print stringy(it) ; print
|
||
print o.hits, '/', o.calls
|
||
print z.hits, '/', z.calls
|
||
print
|
||
for s in sorted(map(stringy, REs), key=lambda n: (len(n), n)):
|
||
print s
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
(.111.) & ((.01 | 11*)')
|
||
|
||
92 / 122
|
||
92 / 122
|
||
|
||
(.01)'
|
||
(.01 | 1)'
|
||
(.01 | ^)'
|
||
(.01 | 1*)'
|
||
(.111.) & ((.01 | 1)')
|
||
(.111. | 11.) & ((.01 | ^)')
|
||
(.111. | 11. | 1.) & ((.01)')
|
||
(.111. | 11.) & ((.01 | 1*)')
|
||
(.111. | 11. | 1.) & ((.01 | 1*)')
|
||
|
||
|
||
Should match:
|
||
|
||
::
|
||
|
||
(.111.) & ((.01 | 11*)')
|
||
|
||
92 / 122
|
||
92 / 122
|
||
|
||
(.01 )'
|
||
(.01 | 1 )'
|
||
(.01 | ^ )'
|
||
(.01 | 1*)'
|
||
(.111.) & ((.01 | 1 )')
|
||
(.111. | 11.) & ((.01 | ^ )')
|
||
(.111. | 11.) & ((.01 | 1*)')
|
||
(.111. | 11. | 1.) & ((.01 )')
|
||
(.111. | 11. | 1.) & ((.01 | 1*)')
|
||
|
||
Larger Alphabets
|
||
----------------
|
||
|
||
We could parse larger alphabets by defining patterns for e.g. each byte
|
||
of the ASCII code. Or we can generalize this code. If you study the code
|
||
above you'll see that we never use the "set-ness" of the symbols ``O``
|
||
and ``l``. The only time Python set operators (``&`` and ``|``) appear
|
||
is in the ``nully()`` function, and there they operate on (recursively
|
||
computed) outputs of that function, never ``O`` and ``l``.
|
||
|
||
What if we try:
|
||
|
||
::
|
||
|
||
(OR, O, l)
|
||
|
||
∂1((OR, O, l))
|
||
∂a(R ∨ S) → ∂a(R) ∨ ∂a(S)
|
||
∂1(O) ∨ ∂1(l)
|
||
∂a(¬a) → ϕ
|
||
ϕ ∨ ∂1(l)
|
||
∂a(a) → λ
|
||
ϕ ∨ λ
|
||
ϕ ∨ R = R
|
||
λ
|
||
|
||
And compare it to:
|
||
|
||
::
|
||
|
||
{'0', '1')
|
||
|
||
∂1({'0', '1'))
|
||
∂a(R ∨ S) → ∂a(R) ∨ ∂a(S)
|
||
∂1({'0')) ∨ ∂1({'1'))
|
||
∂a(¬a) → ϕ
|
||
ϕ ∨ ∂1({'1'))
|
||
∂a(a) → λ
|
||
ϕ ∨ λ
|
||
ϕ ∨ R = R
|
||
λ
|
||
|
||
This suggests that we should be able to alter the functions above to
|
||
detect sets and deal with them appropriately. Exercise for the Reader
|
||
for now.
|
||
|
||
State Machine
|
||
-------------
|
||
|
||
We can drive the regular expressions to flesh out the underlying state
|
||
machine transition table.
|
||
|
||
::
|
||
|
||
.111. & (.01 + 11*)'
|
||
|
||
Says, "Three or more 1's and not ending in 01 nor composed of all 1's."
|
||
|
||
.. figure:: attachment:omg.svg
|
||
:alt: omg.svg
|
||
|
||
omg.svg
|
||
|
||
Start at ``a`` and follow the transition arrows according to their
|
||
labels. Accepting states have a double outline. (Graphic generated with
|
||
`Dot from Graphviz <http://www.graphviz.org/>`__.) You'll see that only
|
||
paths that lead to one of the accepting states will match the regular
|
||
expression. All other paths will terminate at one of the non-accepting
|
||
states.
|
||
|
||
There's a happy path to ``g`` along 111:
|
||
|
||
::
|
||
|
||
a→c→e→g
|
||
|
||
After you reach ``g`` you're stuck there eating 1's until you see a 0,
|
||
which takes you to the ``i→j→i|i→j→h→i`` "trap". You can't reach any
|
||
other states from those two loops.
|
||
|
||
If you see a 0 before you see 111 you will reach ``b``, which forms
|
||
another "trap" with ``d`` and ``f``. The only way out is another happy
|
||
path along 111 to ``h``:
|
||
|
||
::
|
||
|
||
b→d→f→h
|
||
|
||
Once you have reached ``h`` you can see as many 1's or as many 0' in a
|
||
row and still be either still at ``h`` (for 1's) or move to ``i`` (for
|
||
0's). If you find yourself at ``i`` you can see as many 0's, or
|
||
repetitions of 10, as there are, but if you see just a 1 you move to
|
||
``j``.
|
||
|
||
RE to FSM
|
||
~~~~~~~~~
|
||
|
||
So how do we get the state machine from the regular expression?
|
||
|
||
It turns out that each RE is effectively a state, and each arrow points
|
||
to the derivative RE in respect to the arrow's symbol.
|
||
|
||
If we label the initial RE ``a``, we can say:
|
||
|
||
::
|
||
|
||
a --0--> ∂0(a)
|
||
a --1--> ∂1(a)
|
||
|
||
And so on, each new unique RE is a new state in the FSM table.
|
||
|
||
Here are the derived REs at each state:
|
||
|
||
::
|
||
|
||
a = (.111.) & ((.01 | 11*)')
|
||
b = (.111.) & ((.01 | 1)')
|
||
c = (.111. | 11.) & ((.01 | 1*)')
|
||
d = (.111. | 11.) & ((.01 | ^)')
|
||
e = (.111. | 11. | 1.) & ((.01 | 1*)')
|
||
f = (.111. | 11. | 1.) & ((.01)')
|
||
g = (.01 | 1*)'
|
||
h = (.01)'
|
||
i = (.01 | 1)'
|
||
j = (.01 | ^)'
|
||
|
||
You can see the one-way nature of the ``g`` state and the ``hij`` "trap"
|
||
in the way that the ``.111.`` on the left-hand side of the ``&``
|
||
disappears once it has been matched.
|
||
|
||
.. code:: ipython2
|
||
|
||
from collections import defaultdict
|
||
from pprint import pprint
|
||
from string import ascii_lowercase
|
||
|
||
.. code:: ipython2
|
||
|
||
d0, d1 = D_compaction('0'), D_compaction('1')
|
||
|
||
``explore()``
|
||
~~~~~~~~~~~~~
|
||
|
||
.. code:: ipython2
|
||
|
||
def explore(re):
|
||
|
||
# Don't have more than 26 states...
|
||
names = defaultdict(iter(ascii_lowercase).next)
|
||
|
||
table, accepting = dict(), set()
|
||
|
||
to_check = {re}
|
||
while to_check:
|
||
|
||
re = to_check.pop()
|
||
state_name = names[re]
|
||
|
||
if (state_name, 0) in table:
|
||
continue
|
||
|
||
if nully(re):
|
||
accepting.add(state_name)
|
||
|
||
o, i = d0(re), d1(re)
|
||
table[state_name, 0] = names[o] ; to_check.add(o)
|
||
table[state_name, 1] = names[i] ; to_check.add(i)
|
||
|
||
return table, accepting
|
||
|
||
.. code:: ipython2
|
||
|
||
table, accepting = explore(it)
|
||
table
|
||
|
||
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
{('a', 0): 'b',
|
||
('a', 1): 'c',
|
||
('b', 0): 'b',
|
||
('b', 1): 'd',
|
||
('c', 0): 'b',
|
||
('c', 1): 'e',
|
||
('d', 0): 'b',
|
||
('d', 1): 'f',
|
||
('e', 0): 'b',
|
||
('e', 1): 'g',
|
||
('f', 0): 'b',
|
||
('f', 1): 'h',
|
||
('g', 0): 'i',
|
||
('g', 1): 'g',
|
||
('h', 0): 'i',
|
||
('h', 1): 'h',
|
||
('i', 0): 'i',
|
||
('i', 1): 'j',
|
||
('j', 0): 'i',
|
||
('j', 1): 'h'}
|
||
|
||
|
||
|
||
.. code:: ipython2
|
||
|
||
accepting
|
||
|
||
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
{'h', 'i'}
|
||
|
||
|
||
|
||
Generate Diagram
|
||
~~~~~~~~~~~~~~~~
|
||
|
||
Once we have the FSM table and the set of accepting states we can
|
||
generate the diagram above.
|
||
|
||
.. code:: ipython2
|
||
|
||
_template = '''\
|
||
digraph finite_state_machine {
|
||
rankdir=LR;
|
||
size="8,5"
|
||
node [shape = doublecircle]; %s;
|
||
node [shape = circle];
|
||
%s
|
||
}
|
||
'''
|
||
|
||
def link(fr, nm, label):
|
||
return ' %s -> %s [ label = "%s" ];' % (fr, nm, label)
|
||
|
||
|
||
def make_graph(table, accepting):
|
||
return _template % (
|
||
' '.join(accepting),
|
||
'\n'.join(
|
||
link(from_, to, char)
|
||
for (from_, char), (to) in sorted(table.iteritems())
|
||
)
|
||
)
|
||
|
||
.. code:: ipython2
|
||
|
||
print make_graph(table, accepting)
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
digraph finite_state_machine {
|
||
rankdir=LR;
|
||
size="8,5"
|
||
node [shape = doublecircle]; i h;
|
||
node [shape = circle];
|
||
a -> b [ label = "0" ];
|
||
a -> c [ label = "1" ];
|
||
b -> b [ label = "0" ];
|
||
b -> d [ label = "1" ];
|
||
c -> b [ label = "0" ];
|
||
c -> e [ label = "1" ];
|
||
d -> b [ label = "0" ];
|
||
d -> f [ label = "1" ];
|
||
e -> b [ label = "0" ];
|
||
e -> g [ label = "1" ];
|
||
f -> b [ label = "0" ];
|
||
f -> h [ label = "1" ];
|
||
g -> i [ label = "0" ];
|
||
g -> g [ label = "1" ];
|
||
h -> i [ label = "0" ];
|
||
h -> h [ label = "1" ];
|
||
i -> i [ label = "0" ];
|
||
i -> j [ label = "1" ];
|
||
j -> i [ label = "0" ];
|
||
j -> h [ label = "1" ];
|
||
}
|
||
|
||
|
||
|
||
Drive a FSM
|
||
~~~~~~~~~~~
|
||
|
||
There are *lots* of FSM libraries already. Once you have the state
|
||
transition table they should all be straightforward to use. State
|
||
Machine code is very simple. Just for fun, here is an implementation in
|
||
Python that imitates what "compiled" FSM code might look like in an
|
||
"unrolled" form. Most FSM code uses a little driver loop and a table
|
||
datastructure, the code below instead acts like JMP instructions
|
||
("jump", or GOTO in higher-level-but-still-low-level languages) to
|
||
hard-code the information in the table into a little patch of branches.
|
||
|
||
Trampoline Function
|
||
^^^^^^^^^^^^^^^^^^^
|
||
|
||
Python has no GOTO statement but we can fake it with a "trampoline"
|
||
function.
|
||
|
||
.. code:: ipython2
|
||
|
||
def trampoline(input_, jump_from, accepting):
|
||
I = iter(input_)
|
||
while True:
|
||
try:
|
||
bounce_to = jump_from(I)
|
||
except StopIteration:
|
||
return jump_from in accepting
|
||
jump_from = bounce_to
|
||
|
||
Stream Functions
|
||
^^^^^^^^^^^^^^^^
|
||
|
||
Little helpers to process the iterator of our data (a "stream" of "1"
|
||
and "0" characters, not bits.)
|
||
|
||
.. code:: ipython2
|
||
|
||
getch = lambda I: int(next(I))
|
||
|
||
|
||
def _1(I):
|
||
'''Loop on ones.'''
|
||
while getch(I): pass
|
||
|
||
|
||
def _0(I):
|
||
'''Loop on zeros.'''
|
||
while not getch(I): pass
|
||
|
||
A Finite State Machine
|
||
^^^^^^^^^^^^^^^^^^^^^^
|
||
|
||
With those preliminaries out of the way, from the state table of
|
||
``.111. & (.01 + 11*)'`` we can immediately write down state machine
|
||
code. (You have to imagine that these are GOTO statements in C or
|
||
branches in assembly and that the state names are branch destination
|
||
labels.)
|
||
|
||
.. code:: ipython2
|
||
|
||
a = lambda I: c if getch(I) else b
|
||
b = lambda I: _0(I) or d
|
||
c = lambda I: e if getch(I) else b
|
||
d = lambda I: f if getch(I) else b
|
||
e = lambda I: g if getch(I) else b
|
||
f = lambda I: h if getch(I) else b
|
||
g = lambda I: _1(I) or i
|
||
h = lambda I: _1(I) or i
|
||
i = lambda I: _0(I) or j
|
||
j = lambda I: h if getch(I) else i
|
||
|
||
Note that the implementations of ``h`` and ``g`` are identical ergo
|
||
``h = g`` and we could eliminate one in the code but ``h`` is an
|
||
accepting state and ``g`` isn't.
|
||
|
||
.. code:: ipython2
|
||
|
||
def acceptable(input_):
|
||
return trampoline(input_, a, {h, i})
|
||
|
||
.. code:: ipython2
|
||
|
||
for n in range(2**5):
|
||
s = bin(n)[2:]
|
||
print '%05s' % s, acceptable(s)
|
||
|
||
|
||
.. parsed-literal::
|
||
|
||
0 False
|
||
1 False
|
||
10 False
|
||
11 False
|
||
100 False
|
||
101 False
|
||
110 False
|
||
111 False
|
||
1000 False
|
||
1001 False
|
||
1010 False
|
||
1011 False
|
||
1100 False
|
||
1101 False
|
||
1110 True
|
||
1111 False
|
||
10000 False
|
||
10001 False
|
||
10010 False
|
||
10011 False
|
||
10100 False
|
||
10101 False
|
||
10110 False
|
||
10111 True
|
||
11000 False
|
||
11001 False
|
||
11010 False
|
||
11011 False
|
||
11100 True
|
||
11101 False
|
||
11110 True
|
||
11111 False
|
||
|
||
|
||
Reversing the Derivatives to Generate Matching Strings
|
||
------------------------------------------------------
|
||
|
||
(UNFINISHED) Brzozowski also shewed how to go from the state machine to
|
||
strings and expressions...
|
||
|
||
Each of these states is just a name for a Brzozowskian RE, and so, other
|
||
than the initial state ``a``, they can can be described in terms of the
|
||
derivative-with-respect-to-N of some other state/RE:
|
||
|
||
::
|
||
|
||
c = d1(a)
|
||
b = d0(a)
|
||
b = d0(c)
|
||
...
|
||
i = d0(j)
|
||
j = d1(i)
|
||
|
||
Consider:
|
||
|
||
::
|
||
|
||
c = d1(a)
|
||
b = d0(c)
|
||
|
||
Substituting:
|
||
|
||
::
|
||
|
||
b = d0(d1(a))
|
||
|
||
Unwrapping:
|
||
|
||
::
|
||
|
||
b = d10(a)
|
||
|
||
'''
|
||
|
||
::
|
||
|
||
j = d1(d0(j))
|
||
|
||
Unwrapping:
|
||
|
||
::
|
||
|
||
j = d1(d0(j)) = d01(j)
|
||
|
||
We have a loop or "fixed point".
|
||
|
||
::
|
||
|
||
j = d01(j) = d0101(j) = d010101(j) = ...
|
||
|
||
hmm...
|
||
|
||
::
|
||
|
||
j = (01)*
|