Thun/docs/fun_with_scan.rst

.. code:: ipython2

    from notebook_preamble import D, DefinitionWrapper, J, V, define

On “Two Exercises Found in a Book on Algorithmics”
==================================================

Bird & Meertens

`PDF paper available
here <https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.694.2614>`__

Define ``scan`` in terms of a reduction.
----------------------------------------

   Problem I. The reduction operator ``/`` of APL takes some binary
   operator ``⨁`` on its left and a vector ``x`` of values on its right.
   The meaning of ``⨁/x`` for ``x = [a b ... z]`` is the value
   ``a⨁b⨁...⨁z``. For this to be well-defined in the absence of
   brackets, the operation ``⨁`` has to be associative. Now there is
   another operator ``\`` of APL called ``scan``. Its effect is closely
   related to reduction in that we have:

::

   ⨁\x = [a a⨁b a⨁b⨁c ... a⨁b⨁...⨁z]

..

   The problem is to find some definition of ``scan`` as a reduction. In
   other words, we have to find some function ``f`` and an operator
   ``⨂`` so that

::

   ⨁\x = f(a)⨂f(b)⨂...⨂f(z)

Designing the Recursive Function
--------------------------------

Ignoring the exact requirements (finding ``f`` and ``⨂``) can we
implement ``scan`` as a hylomorphism in Joy?

Looking at the forms of hylomorphism, ``H3`` is the one to use:

``H3``
~~~~~~

If the combiner and the generator both need to work on the current value
then ``dup`` must be used, and the generator must produce one item
instead of two (the b is instead the duplicate of a.)

::

   H3 == [P] [pop c] [[G] dupdip] [dip F] genrec

   ... a [G] dupdip [H3] dip F
   ... a  G  a      [H3] dip F
   ... a′    a      [H3] dip F
   ... a′ H3 a               F
   ... a′ [G] dupdip [H3] dip F a F
   ... a′  G  a′     [H3] dip F a F
   ... a″     a′     [H3] dip F a F
   ... a″ H3  a′              F a F
   ... a″ [G] dupdip [H3] dip F a′ F a F
   ... a″  G    a″   [H3] dip F a′ F a F
   ... a‴       a″   [H3] dip F a′ F a F
   ... a‴ H3    a″            F a′ F a F
   ... a‴ pop c a″ F a′ F a F
   ...        c a″ F a′ F a F
   ...        d      a′ F a F
   ...        d′          a F
   ...        d″

Initial Definition
~~~~~~~~~~~~~~~~~~

We’re building a list of values so this is an “anamorphism”. (An
anamorphism uses ``[]`` for ``c`` and ``swons`` for ``F``.)

::

   scan == [P] [pop []] [[G] dupdip]      [dip swons] genrec

Convert to ``ifte``:

::

   scan == [P] [pop []] [[G] dupdip [scan] dip swons] ifte

On the recursive branch ``[G] dupdip`` doesn’t cut it:

::

   [1 2 3] [G] dupdip [scan] dip swons
   [1 2 3]  G [1 2 3] [scan] dip swons

Use ``first``
~~~~~~~~~~~~~

At this point, we want the copy of ``[1 2 3]`` to just be ``1``, so we
use ``first``.

::

   scan == [P] [pop []] [[G] dupdip first] [dip swons] genrec

   [1 2 3] [G] dupdip first [scan] dip swons
   [1 2 3]  G [1 2 3] first [scan] dip swons
   [1 2 3]  G  1            [scan] dip swons

``G`` applies ``⨁``
~~~~~~~~~~~~~~~~~~~

Now what does ``G`` have to do? Just apply ``⨁`` to the first two terms
in the list.

::

   [1 2 3] G
   [1 2 3] [⨁] infra
   [1 2 3] [+] infra
   [3 3]

Predicate ``P``
~~~~~~~~~~~~~~~

Which tells us that the predicate ``[P]`` must guard against lists with
less that two items in them:

::

   P == size 1 <=

Let’s see what we’ve got so far:

::

   scan == [P        ] [pop []] [[G]         dupdip first] [dip swons] genrec
   scan == [size 1 <=] [pop []] [[[F] infra] dupdip first] [dip swons] genrec

Handling the Last Term
~~~~~~~~~~~~~~~~~~~~~~

This works to a point, but it throws away the last term:

.. code:: ipython2

    J('[1 2 3] [size 1 <=] [pop []] [[[+] infra] dupdip first] [dip swons] genrec')


.. parsed-literal::

    [1 3]


Hmm… Let’s take out the ``pop`` for a sec…

.. code:: ipython2

    J('[1 2 3] [size 1 <=] [[]] [[[+] infra] dupdip first] [dip swons] genrec')


.. parsed-literal::

    [6] [1 3]


That leaves the last item in our list, then it puts an empty list on the
stack and ``swons``\ ’s the new terms onto that. If we leave out that
empty list, they will be ``swons``\ ’d onto that list that already has
the last item.

.. code:: ipython2

    J('[1 2 3] [size 1 <=] [] [[[+] infra] dupdip first] [dip swons] genrec')


.. parsed-literal::

    [1 3 6]


Parameterize ``⨁``
~~~~~~~~~~~~~~~~~~

So we have:

::

   [⨁] scan == [size 1 <=] [] [[[⨁] infra] dupdip first] [dip swons] genrec

Trivially:

::

    == [size 1 <=] [] [[[⨁] infra] dupdip first]                 [dip swons] genrec
    == [[[⨁] infra] dupdip first]           [size 1 <=] [] roll< [dip swons] genrec
    == [[⨁] infra]      [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec
    == [⨁] [infra] cons [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec

And so:

::

   scan == [infra] cons [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec

.. code:: ipython2

    define('scan == [infra] cons [dupdip first] cons [size 1 <=] [] roll< [dip swons] genrec')

.. code:: ipython2

    J('[1 2 3 4] [+] scan')


.. parsed-literal::

    [1 3 6 10]


.. code:: ipython2

    J('[1 2 3 4] [*] scan')


.. parsed-literal::

    [1 2 6 24]


.. code:: ipython2

    J('[1 2 3 4 5 6 7] [neg +] scan')


.. parsed-literal::

    [1 1 2 2 3 3 4]


Problem 2.
----------

   Define a line to be a sequence of characters not containing the
   newline character. It is easy to define a function ``Unlines`` that
   converts a non-empty sequence of lines into a sequence of characters
   by inserting newline characters between every two lines.

   Since ``Unlines`` is injective, the function ``Lines``, which
   converts a sequence of characters into a sequence of lines by
   splitting on newline characters, can be specified as the inverse of
   ``Unlines``.

   The problem, just as in Problem 1. is to find a definition by
   reduction of the function ``Lines``.

::

   Unlines = uncons ['\n' swap + +] step

.. code:: ipython2

    J('["hello" "world"] uncons ["\n" swap + +] step')


.. parsed-literal::

    'hello\nworld'


Again ignoring the actual task let’s just derive ``Lines``:

::

      "abc\nefg\nhij" Lines
   ---------------------------
       ["abc" "efg" "hij"]

Instead of ``P == [size 1 <=]`` we want ``["\n" in]``, and for the
base-case of a string with no newlines in it we want to use ``unit``:

::

   Lines == ["\n" in] [unit] [R0]       [dip swons] genrec
   Lines == ["\n" in] [unit] [R0 [Lines] dip swons] ifte

Derive ``R0``:

::

   "a \n b" R0                    [Lines] dip swons
   "a \n b" split-at-newline swap [Lines] dip swons
   "a " " b"                 swap [Lines] dip swons
   " b" "a "                      [Lines] dip swons
   " b" Lines "a " swons
   [" b"]     "a " swons
   ["a " " b"]

So:

::

   R0 == split-at-newline swap

   Lines == ["\n" in] [unit] [split-at-newline swap] [dip swons] genrec

Missing the Point?
------------------

This is all good and well, but in the paper many interesting laws and
properties are discussed. Am I missing the point?

::

   0 [a b c d] [F] step == 0 [a b] [F] step 0 [c d] [F] step concat

For associative function ``F`` and a “unit” element for that function,
here represented by ``0``.

For functions that don’t have a “unit” we can fake it (the example is
given of infinity for the ``min(a, b)`` function.) We can also use:

::

   safe_step == [size 1 <=] [] [uncons [F] step] ifte

Or:

::

   safe_step == [pop size 1 <=] [pop] [[uncons] dip step] ifte

      [a b c] [F] safe_step
   ---------------------------
      a [b c] [F] step

To limit ``F`` to working on pairs of terms from its domain.