Thun/docs/fun_with_scan.ipynb

437 lines
12 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from notebook_preamble import D, DefinitionWrapper, J, V, define"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# On \"Two Exercises Found in a Book on Algorithmics\"\n",
"\n",
"Bird & Meertens\n",
"\n",
"[PDF paper available here](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.694.2614)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define `scan` in terms of a reduction.\n",
"\n",
"> Problem I. The reduction operator `/` of APL takes some binary operator `⨁` on its left and a vector `x` of values on its right. The meaning of `⨁/x` for `x = [a b ... z]` is the value `a⨁b⨁...⨁z`. For this to be well-defined in the absence of brackets, the operation `⨁` has to be associative. Now there is another operator `\\` of APL called `scan`. Its effect is closely related to reduction in that we have:\n",
"\n",
" ⨁\\x = [a a⨁b a⨁b⨁c ... a⨁b⨁...⨁z]\n",
"\n",
"> The problem is to find some definition of `scan` as a reduction. In other words, we have to find some function `f` and an operator `⨂` so that\n",
"\n",
" ⨁\\x = f(a)⨂f(b)⨂...⨂f(z)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Designing the Recursive Function\n",
"Ignoring the exact requirements (finding `f` and `⨂`) can we implement `scan` as a hylomorphism in Joy?\n",
"\n",
"Looking at the forms of hylomorphism, `H3` is the one to use:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `H3`\n",
"If the combiner and the generator both need to work on the current value then `dup` must be used, and the generator must produce one item instead of two (the b is instead the duplicate of a.)\n",
"\n",
"\n",
" H3 == [P] [pop c] [[G] dupdip] [dip F] genrec\n",
"\n",
" ... a [G] dupdip [H3] dip F\n",
" ... a G a [H3] dip F\n",
" ... a a [H3] dip F\n",
" ... a H3 a F\n",
" ... a [G] dupdip [H3] dip F a F\n",
" ... a G a [H3] dip F a F\n",
" ... a″ a [H3] dip F a F\n",
" ... a″ H3 a F a F\n",
" ... a″ [G] dupdip [H3] dip F a F a F\n",
" ... a″ G a″ [H3] dip F a F a F\n",
" ... a‴ a″ [H3] dip F a F a F\n",
" ... a‴ H3 a″ F a F a F\n",
" ... a‴ pop c a″ F a F a F\n",
" ... c a″ F a F a F\n",
" ... d a F a F\n",
" ... d a F\n",
" ... d″"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initial Definition\n",
"We're building a list of values so this is an \"anamorphism\". (An anamorphism uses `[]` for `c` and `swons` for `F`.)\n",
"\n",
" scan == [P] [pop []] [[G] dupdip] [dip swons] genrec\n",
"\n",
"Convert to `ifte`:\n",
"\n",
" scan == [P] [pop []] [[G] dupdip [scan] dip swons] ifte"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On the recursive branch `[G] dupdip` doesn't cut it:\n",
"\n",
" [1 2 3] [G] dupdip [scan] dip swons\n",
" [1 2 3] G [1 2 3] [scan] dip swons"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use `first`\n",
"At this point, we want the copy of `[1 2 3]` to just be `1`, so we use `first`.\n",
"\n",
" scan == [P] [pop []] [[G] dupdip first] [dip swons] genrec\n",
"\n",
" [1 2 3] [G] dupdip first [scan] dip swons\n",
" [1 2 3] G [1 2 3] first [scan] dip swons\n",
" [1 2 3] G 1 [scan] dip swons"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `G` applies `⨁`\n",
"Now what does `G` have to do? Just apply `⨁` to the first two terms in the list.\n",
"\n",
" [1 2 3] G\n",
" [1 2 3] [⨁] infra\n",
" [1 2 3] [+] infra\n",
" [3 3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predicate `P`\n",
"Which tells us that the predicate `[P]` must guard against lists with less that two items in them:\n",
"\n",
" P == size 1 <="
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see what we've got so far:\n",
"\n",
" scan == [P ] [pop []] [[G] dupdip first] [dip swons] genrec\n",
" scan == [size 1 <=] [pop []] [[[F] infra] dupdip first] [dip swons] genrec"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Handling the Last Term\n",
"This works to a point, but it throws away the last term:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 3]\n"
]
}
],
"source": [
"J('[1 2 3] [size 1 <=] [pop []] [[[+] infra] dupdip first] [dip swons] genrec')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hmm... Let's take out the `pop` for a sec..."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[6] [1 3]\n"
]
}
],
"source": [
"J('[1 2 3] [size 1 <=] [[]] [[[+] infra] dupdip first] [dip swons] genrec')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That leaves the last item in our list, then it puts an empty list on the stack and `swons`'s the new terms onto that. If we leave out that empty list, they will be `swons`'d onto that list that already has the last item."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 3 6]\n"
]
}
],
"source": [
"J('[1 2 3] [size 1 <=] [] [[[+] infra] dupdip first] [dip swons] genrec')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parameterize `⨁`\n",
"So we have:\n",
"\n",
" [⨁] scan == [size 1 <=] [] [[[⨁] infra] dupdip first] [dip swons] genrec\n",
"\n",
"Trivially:\n",
"\n",
" == [size 1 <=] [] [[[⨁] infra] dupdip first] [dip swons] genrec\n",
" == [[[⨁] infra] dupdip first] [size 1 <=] [] roll< [dip swons] genrec\n",
" == [[⨁] infra] [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec\n",
" == [⨁] [infra] cons [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec\n",
"\n",
"And so:\n",
"\n",
" scan == [infra] cons [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"define('scan [infra] cons [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 3 6 10]\n"
]
}
],
"source": [
"J('[1 2 3 4] [+] scan')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 2 6 24]\n"
]
}
],
"source": [
"J('[1 2 3 4] [*] scan')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 1 2 2 3 3 4]\n"
]
}
],
"source": [
"J('[1 2 3 4 5 6 7] [neg +] scan')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Problem 2.\n",
"> Define a line to be a sequence of characters not containing the newline character. It is easy to define a function `Unlines` that converts a non-empty sequence of lines into a sequence of characters by inserting newline characters between every two lines.\n",
">\n",
"> Since `Unlines` is injective, the function `Lines`, which converts a sequence of characters into a sequence of lines by splitting on newline characters, can be specified as the inverse of `Unlines`.\n",
">\n",
"> The problem, just as in Problem 1. is to find a definition by reduction of the function `Lines`.\n",
"\n",
"\n",
" Unlines = uncons ['\\n' swap + +] step\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'hello\\nworld'\n"
]
}
],
"source": [
"J('[\"hello\" \"world\"] uncons [\"\\n\" swap + +] step')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again ignoring the actual task let's just derive `Lines`:\n",
"\n",
" \"abc\\nefg\\nhij\" Lines\n",
" ---------------------------\n",
" [\"abc\" \"efg\" \"hij\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of `P == [size 1 <=]` we want `[\"\\n\" in]`, and for the base-case of a string with no newlines in it we want to use `unit`:\n",
"\n",
" Lines == [\"\\n\" in] [unit] [R0] [dip swons] genrec\n",
" Lines == [\"\\n\" in] [unit] [R0 [Lines] dip swons] ifte"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Derive `R0`:\n",
"\n",
" \"a \\n b\" R0 [Lines] dip swons\n",
" \"a \\n b\" split-at-newline swap [Lines] dip swons\n",
" \"a \" \" b\" swap [Lines] dip swons\n",
" \" b\" \"a \" [Lines] dip swons\n",
" \" b\" Lines \"a \" swons\n",
" [\" b\"] \"a \" swons\n",
" [\"a \" \" b\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So:\n",
"\n",
" R0 == split-at-newline swap\n",
"\n",
" Lines == [\"\\n\" in] [unit] [split-at-newline swap] [dip swons] genrec"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Missing the Point?\n",
"This is all good and well, but in the paper many interesting laws and properties are discussed. Am I missing the point?\n",
"\n",
" 0 [a b c d] [F] step == 0 [a b] [F] step 0 [c d] [F] step concat\n",
"\n",
"For associative function `F` and a \"unit\" element for that function, here represented by `0`.\n",
"\n",
"For functions that don't have a \"unit\" we can fake it (the example is given of infinity for the `min(a, b)` function.) We can also use:\n",
"\n",
" safe_step == [size 1 <=] [] [uncons [F] step] ifte\n",
"\n",
"Or:\n",
"\n",
" safe_step == [pop size 1 <=] [pop] [[uncons] dip step] ifte\n",
"\n",
" [a b c] [F] safe_step\n",
" ---------------------------\n",
" a [b c] [F] step\n",
"\n",
"To limit `F` to working on pairs of terms from its domain.\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}