Thun/docs/notebooks/fun_with_scan.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from notebook_preamble import D, DefinitionWrapper, J, V, define"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# On \"Two Exercises Found in a Book on Algorithmics\"\n",
    "\n",
    "Bird & Meertens\n",
    "\n",
    "[PDF paper available here](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.694.2614)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define `scan` in terms of a reduction.\n",
    "\n",
    "> Problem I. The reduction operator `/` of APL takes some binary operator `⨁` on its left and a vector `x` of values on its right. The meaning of `⨁/x` for `x = [a b ... z]` is the value `a⨁b⨁...⨁z`.  For this to be well-defined in the absence of brackets, the operation `⨁` has to be associative.  Now there is another operator `\\` of APL called `scan`.  Its effect is closely related to reduction in that we have:\n",
    "\n",
    "    ⨁\\x = [a a⨁b a⨁b⨁c ... a⨁b⨁...⨁z]\n",
    "\n",
    "> The problem is to find some definition of `scan` as a reduction.  In other words, we have to find some function `f` and an operator `⨂` so that\n",
    "\n",
    "    ⨁\\x = f(a)⨂f(b)⨂...⨂f(z)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Designing the Recursive Function\n",
    "Ignoring the exact requirements (finding `f` and `⨂`) can we implement `scan` as a hylomorphism in Joy?\n",
    "\n",
    "Looking at the forms of hylomorphism, `H3` is the one to use:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `H3`\n",
    "If the combiner and the generator both need to work on the current value then `dup` must be used, and the generator must produce one item instead of two (the b is instead the duplicate of a.)\n",
    "\n",
    "\n",
    "    H3 == [P] [pop c] [[G] dupdip] [dip F] genrec\n",
    "\n",
    "    ... a [G] dupdip [H3] dip F\n",
    "    ... a  G  a      [H3] dip F\n",
    "    ... a′    a      [H3] dip F\n",
    "    ... a′ H3 a               F\n",
    "    ... a′ [G] dupdip [H3] dip F a F\n",
    "    ... a′  G  a′     [H3] dip F a F\n",
    "    ... a″     a′     [H3] dip F a F\n",
    "    ... a″ H3  a′              F a F\n",
    "    ... a″ [G] dupdip [H3] dip F a′ F a F\n",
    "    ... a″  G    a″   [H3] dip F a′ F a F\n",
    "    ... a‴       a″   [H3] dip F a′ F a F\n",
    "    ... a‴ H3    a″            F a′ F a F\n",
    "    ... a‴ pop c a″ F a′ F a F\n",
    "    ...        c a″ F a′ F a F\n",
    "    ...        d      a′ F a F\n",
    "    ...        d′          a F\n",
    "    ...        d″"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initial Definition\n",
    "We're building a list of values so this is an \"anamorphism\".  (An anamorphism uses `[]` for `c` and `swons` for `F`.)\n",
    "\n",
    "    scan == [P] [pop []] [[G] dupdip]      [dip swons] genrec\n",
    "\n",
    "Convert to `ifte`:\n",
    "\n",
    "    scan == [P] [pop []] [[G] dupdip [scan] dip swons] ifte"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "On the recursive branch `[G] dupdip` doesn't cut it:\n",
    "\n",
    "    [1 2 3] [G] dupdip [scan] dip swons\n",
    "    [1 2 3]  G [1 2 3] [scan] dip swons"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use `first`\n",
    "At this point, we want the copy of `[1 2 3]` to just be `1`, so we use `first`.\n",
    "\n",
    "    scan == [P] [pop []] [[G] dupdip first] [dip swons] genrec\n",
    "\n",
    "    [1 2 3] [G] dupdip first [scan] dip swons\n",
    "    [1 2 3]  G [1 2 3] first [scan] dip swons\n",
    "    [1 2 3]  G  1            [scan] dip swons"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### `G` applies `⨁`\n",
    "Now what does `G` have to do?  Just apply `⨁` to the first two terms in the list.\n",
    "\n",
    "    [1 2 3] G\n",
    "    [1 2 3] [⨁] infra\n",
    "    [1 2 3] [+] infra\n",
    "    [3 3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Predicate `P`\n",
    "Which tells us that the predicate `[P]` must guard against lists with less that two items in them:\n",
    "\n",
    "    P == size 1 <="
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see what we've got so far:\n",
    "\n",
    "    scan == [P        ] [pop []] [[G]         dupdip first] [dip swons] genrec\n",
    "    scan == [size 1 <=] [pop []] [[[F] infra] dupdip first] [dip swons] genrec"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Handling the Last Term\n",
    "This works to a point, but it throws away the last term:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 3]\n"
     ]
    }
   ],
   "source": [
    "J('[1 2 3] [size 1 <=] [pop []] [[[+] infra] dupdip first] [dip swons] genrec')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hmm... Let's take out the `pop` for a sec..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[6] [1 3]\n"
     ]
    }
   ],
   "source": [
    "J('[1 2 3] [size 1 <=] [[]] [[[+] infra] dupdip first] [dip swons] genrec')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That leaves the last item in our list, then it puts an empty list on the stack and `swons`'s the new terms onto that.  If we leave out that empty list, they will be `swons`'d onto that list that already has the last item."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 3 6]\n"
     ]
    }
   ],
   "source": [
    "J('[1 2 3] [size 1 <=] [] [[[+] infra] dupdip first] [dip swons] genrec')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parameterize `⨁`\n",
    "So we have:\n",
    "\n",
    "    [⨁] scan == [size 1 <=] [] [[[⨁] infra] dupdip first] [dip swons] genrec\n",
    "\n",
    "Trivially:\n",
    "\n",
    "     == [size 1 <=] [] [[[⨁] infra] dupdip first]                 [dip swons] genrec\n",
    "     == [[[⨁] infra] dupdip first]           [size 1 <=] [] roll< [dip swons] genrec\n",
    "     == [[⨁] infra]      [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec\n",
    "     == [⨁] [infra] cons [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec\n",
    "\n",
    "And so:\n",
    "\n",
    "    scan == [infra] cons [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "define('scan [infra] cons [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 3 6 10]\n"
     ]
    }
   ],
   "source": [
    "J('[1 2 3 4] [+] scan')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 2 6 24]\n"
     ]
    }
   ],
   "source": [
    "J('[1 2 3 4] [*] scan')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 1 2 2 3 3 4]\n"
     ]
    }
   ],
   "source": [
    "J('[1 2 3 4 5 6 7] [neg +] scan')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 2.\n",
    "> Define a line to be a sequence of characters not containing the newline character.  It is easy to define a function `Unlines` that converts a non-empty sequence of lines into a sequence of characters by inserting newline characters between every two lines.\n",
    ">\n",
    "> Since `Unlines` is injective, the function `Lines`, which converts a sequence of characters into a sequence of lines by splitting on newline characters, can be specified as the inverse of `Unlines`.\n",
    ">\n",
    "> The problem, just as in Problem 1. is to find a definition by reduction of the function `Lines`.\n",
    "\n",
    "\n",
    "    Unlines = uncons ['\\n' swap + +] step\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'hello\\nworld'\n"
     ]
    }
   ],
   "source": [
    "J('[\"hello\" \"world\"] uncons [\"\\n\" swap + +] step')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again ignoring the actual task let's just derive `Lines`:\n",
    "\n",
    "       \"abc\\nefg\\nhij\" Lines\n",
    "    ---------------------------\n",
    "        [\"abc\" \"efg\" \"hij\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of `P == [size 1 <=]` we want `[\"\\n\" in]`, and for the base-case of a string with no newlines in it we want to use `unit`:\n",
    "\n",
    "    Lines == [\"\\n\" in] [unit] [R0]       [dip swons] genrec\n",
    "    Lines == [\"\\n\" in] [unit] [R0 [Lines] dip swons] ifte"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Derive `R0`:\n",
    "\n",
    "    \"a \\n b\" R0                    [Lines] dip swons\n",
    "    \"a \\n b\" split-at-newline swap [Lines] dip swons\n",
    "    \"a \" \" b\"                 swap [Lines] dip swons\n",
    "    \" b\" \"a \"                      [Lines] dip swons\n",
    "    \" b\" Lines \"a \" swons\n",
    "    [\" b\"]     \"a \" swons\n",
    "    [\"a \" \" b\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So:\n",
    "\n",
    "    R0 == split-at-newline swap\n",
    "\n",
    "    Lines == [\"\\n\" in] [unit] [split-at-newline swap] [dip swons] genrec"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Missing the Point?\n",
    "This is all good and well, but in the paper many interesting laws and properties are discussed.  Am I missing the point?\n",
    "\n",
    "    0 [a b c d] [F] step == 0 [a b] [F] step 0 [c d] [F] step concat\n",
    "\n",
    "For associative function `F` and a \"unit\" element for that function, here represented by `0`.\n",
    "\n",
    "For functions that don't have a \"unit\" we can fake it (the example is given of infinity for the `min(a, b)` function.) We can also use:\n",
    "\n",
    "    safe_step == [size 1 <=] [] [uncons [F] step] ifte\n",
    "\n",
    "Or:\n",
    "\n",
    "    safe_step == [pop size 1 <=] [pop] [[uncons] dip step] ifte\n",
    "\n",
    "       [a b c] [F] safe_step\n",
    "    ---------------------------\n",
    "       a [b c] [F] step\n",
    "\n",
    "To limit `F` to working on pairs of terms from its domain.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}