Hygienic macros
==============================================================

In episode 9_ I noted that Scheme provides three
major macro systems (``syntax-rules``, ``syntax-case`` and
``define-macro``), yet I went on to discuss
my own personal macro system, ``sweet-macros``.
The decision was motivated by various reasons.
First of all, I did not want
to confuse my readers by describing too many macro systems
at the same time. Secondly, I wanted to make macros
easier and more debuggable. Finally, ``sweet-macros`` are
slightly more powerful than the other macro systems, with a better
support for guarded patterns and with a few extensibility features
which I have not shown yet (but I like to keep some trick under my
sleeve ;).

After 19 episodes about macros, I can safely assume that my
readers are not beginners anymore. It is time to have a look at the
larger Scheme world and to compare/contrast ``sweet-macros`` with the
other macro systems.  I do not want to discuss all the macro
systems in existence here, therefore I will skip a few interesting systems
such as syntactic closures and explicit renaming macros. However,
readers interested in alternative macro systems for Scheme should have
a look at this excellent `post by Alex Shinn`_ which summarizes the
current situation better than I could do. Notice that Alex is
strongly biased against ``syntax-case`` and very much in favor of
explicit renaming macros. The two
systems are not incompatible though, and actually Larceny provides a
``syntax-case`` implementation built on top of explicit renaming
macros (see also SRFI-72_).

.. _SRFI-72: http://srfi.schemers.org/srfi-72/srfi-72.html
.. _The Scheme Programming Language: http://www.scheme.com/tspl3/
.. _Syntax-Rules Primer for the Mildly Insane: http://groups.google.com/group/comp.lang.scheme/browse_frm/thread/86c338837de3a020/eb6cc6e11775b619?#eb6cc6e11775b619

``syntax-match`` vs ``syntax-rules``
-----------------------------------------------------------------

``syntax-rules`` can be quite trivially defined in terms of
``syntax-match``::

 (def-syntax (syntax-rules (literal ...) (patt templ) ...)
   #'(syntax-match (literal ...) (sub patt #'templ) ...))

As you see, the main difference between ``syntax-rules`` (apart for a
missing ``sub``) is that ``syntax-rules`` automatically
adds the syntax-quote ``#'`` operator to you templates. That means
that you cannot use quasisyntax tricks and that ``syntax-rules`` is
strictly less powerful than ``syntax-match``. The other difference is
that ``syntax-rules`` macros do not have guarded patterns; the most direct
consequence is that providing good error messages for incorrect syntax
is more difficult. You may learn everything you ever wanted to know
about ``syntax-rules`` in the `Syntax-Rules Primer for the Mildly Insane`_
by Joe Marshall.


``syntax-match`` vs ``syntax-case``
-----------------------------------------------------------------

In principle, ``syntax-case`` could be defined in terms of ``syntax-match`` as
follows::

 (def-syntax syntax-case
   (syntax-match ()
    (sub (syntax-case x (literal ...) (patt guard skel) ...)
     #'(syntax-match x (literal ...) (sub patt skel guard) ...))
    (sub (syntax-case x (literal ...) (patt skel) ...)
     #'(syntax-match x (literal ...) (sub patt skel) ...))
  ))

In reality, ``syntax-case`` is a Scheme primitive and
``syntax-match`` is defined on top of it. So, ``syntax-case`` has
theoretically the same power as ``syntax-match``, but in practice
``syntax-match`` is more convenient to use.

The major syntactic difference
is the position of the guard, which in ``syntax-case`` is positioned *before*
the skeleton, whereas in ``syntax-match`` is positioned *after* the skeleton
I did spent a lot of time thinking about the right position for the guard:
I hate gratuitous breaking, but I convinced myself that the position of the
guard in ``syntax-case`` is really broken, so I had to *fix* the issue.
The problem with ``syntax-case`` is that while the pattern is always
in the first position, you never know what is in the
second position: it could be the guard *or* the template; in order
to distinguish the possibilities you have to check if there is
a third expression in the clause and that is annoying,

In ``syntax-match`` the template
is *always* in the second position; if there is something in the
third position, it is always a guard; moreover, there could be
another expression in the clause (in the fourth position) which is
used as alternative template if the guard is not satisfied.
The advantage of having fixed positions is that it is easier to
write higher order macros expanding to ``syntax-match`` transformers
(``syntax-match`` itself is implemented in terms of ``syntax-case``
where the template are not in fixed position and I had to make the
implementation more complex just to cope with that).

You may learn (nearly) everything there is to know
about ``syntax-case`` in the the book
`The Scheme Programming Language`_ by Kent Dybvig.


``syntax-match`` versus ``define-macro``
---------------------------------------------------------------

Nowadays macros based on ``define-macro`` are much less used than in
past because macro systems based on pattern matching are much more
powerful, easier and safer to use. The R6RS specification made
``syntax-case`` enter in the standard and it is the preferred macro
system for most implementation.

Nowadays, there is a good chance that your Scheme implementation does
not provide ``define-macro`` out of the box, therefore you need to
implement it in term of ``syntax-case`` (or ``syntax-match``). Here is
an example of such an implementation:

.. code-block:: scheme

  (def-syntax define-macro
    (syntax-match ()
       (sub (define-macro (name . params) body1 body2 ...)
         #'(define-macro name (lambda params body1 body2 ...)))
       (sub (define-macro name expander)
         #'(def-syntax (name . args)
            (datum->syntax #'name (apply expander (syntax->datum #'args)))))
       ))

The code should be clear: the arguments of the macro are converted into
a regular list which is then transformed with the expander, and converted
back into a syntax object in the context of the macro.
``define-macro`` macros are based on simple list manipulations and
are very easy to explain and to understand: unfortunately, they
are affected by the hygiene problem.

You can find examples of use
of ``define-macro`` in many references; I learned it from
`Teach Yourself Scheme in Fixnum Days`_ by Dorai Sitaram.

.. _Teach Yourself Scheme in Fixnum Days: http://www.ccs.neu.edu/home/dorai/t-y-scheme/t-y-scheme-Z-H-1.html
.. _9: http://www.artima.com/weblogs/viewpost.jsp?thread=240804
.. _On Lisp: http://www.paulgraham.com/onlisp.html
.. _hygiene in R6RS: http://docs.plt-scheme.org/r6rs-lib-std/r6rs-lib-Z-H-13.html#node_sec_12.1
.. _post by Alex Shinn: http://lists.gnu.org/archive/html/chicken-users/2008-04/msg00013.html
.. _chapter about syntax-case: http://www.cs.indiana.edu/~dyb/pubs/bc-syntax-case.pdf

The hygiene problem
---------------------------------------------------------------------

If you have experience in Common Lisp or other Lisp dialects, you will
have heard about the problem of hygiene in macros, a.k.a. the problem
of *variable capture*.  As Paul Graham puts it,
*errors caused by variable capture are rare, but what they lack
in frequency they make up in viciousness*.
The hygiene problem is the main reason why ``define-macro`` is becoming
less and less used in the Scheme world. PLT Scheme has being
deprecating it for years and nowadays even Chicken Scheme, which
traditionally used ``define-macro``, has removed it from the
core, by using hygienic macros instead: this is the reason why the
current Chicken (Chicken 4) is called "hygienic Chicken".

You can find good discussions of the hygiene problem in Common Lisp in
many places; I am familiar with Paul Graham's book `On Lisp`_ which I
definitively recommend: chapter 9 on variable capture
has influenced this section. Another good
reference is the `chapter about syntax-case`_ - by Kent Dybvig - in
the book `Beautiful Code`_.  Here I will give just a short example
exhibiting the problem, for the sake of the readers unfamiliar with
it.

.. image:: hygienic-paper-small.jpg

Consider this "dirty" definition of the ``for`` loop:

.. code-block:: scheme

   (define-macro (dirty-for i i1 i2 . body)
     `(let ((start ,i1) (stop ,i2))
        (let loop ((,i start))
          (unless (>= ,i stop) ,@body (loop (+ 1 ,i))))))

Superficially ``define-macro`` looks quite similar to ``def-syntax``,
except that in the macro body you need to a add a comma in front of
each macro argument argument. Internally, however, macros based on
``define-macro`` are completely different. In particular, they are not
safe under variable capture and that may cause surprises.
For instance, code such as

.. code-block:: scheme

 > (let ((start 42))
    (dirty-for i 1 3 (display start) (newline)))
 1
 1

prints the number 1 (twice) and not the number 42!

The reason is clear if you expand the macro (notice
the if you implement ``define-macro`` in terms of
``syntax-match`` then ``syntax-expand`` still works):

.. code-block:: scheme

 > (syntax-expand (dirty-for i 1 3 (display start) (newline)))
 (let ((start 1) (stop 3))
   (let loop ((i start))
     (unless (>= i stop) (display start) (newline)
       (loop (+ 1 i)))))

Since the inner variable ``start`` is shadowing the outer variable ``start``,
the number 1 is printed instead of the number 42.
The problem can be solved by introducing unique identifiers in the
macro by means of ``gensym``
(``gensym`` is not in the R6RS standard, but in practice every Scheme
implementation has it; for convenience I have included it in my
``(aps compat)`` compatibility library).

The ``dirty-for`` macro
can be improved to use ``gensym`` for every variable which is internally
defined (and it is not a macro argument):

.. code-block:: scheme

   (define-macro (less-dirty-for i i1 i2 . body)
     (let ((start (gensym)) (stop (gensym)) (loop (gensym)))
      `(let ((,start ,i1) (,stop ,i2))
         (let ,loop ((,i ,start))
           (unless (>= ,i ,stop) ,@body (,loop (+ 1 ,i)))))))

.. code-block:: scheme

 > (let ((start 42))
    (less-dirty-for i 1 3 (display start) (newline)))
 42
 42

``less-dirty-for`` works because all internal variables have now
unique names that cannot collide with existing identifiers by
construction.  You can see the names used internally by invoking
``syntax-expand`` (notice that by construction such names change every
time you expand the macro):

.. code-block:: scheme

   > (syntax-expand (less-dirty-for i 1 3 (display i))); in Ikarus
   (let ((#{g0 |K!ZoUGmIIl%SrMfI|} 1) (#{g1 |qecoGeEAOv0R8$%0|} 3))
     (let #{g2 |kD?xfov61j0$G1=M|} ((i #{g0 |K!ZoUGmIIl%SrMfI|}))
       (unless (>= i #{g1 |qecoGeEAOv0R8$%0|}) (display i)
        (#{g2 |kD?xfov61j0$G1=M|} (+ 1 i)))))

Unfortunately ``less-dirty-for`` is not really clean.
``gensym`` cannot do anything to solve the *free-symbol capture*
problem (I am using Paul Graham's terminology here).

The problem is that identifiers used (but not defined) in the macro
have the scope of expanded code, not the scope of the original macro:
in particular, if the outer scope in expanded code redefines the meaning of an
identifier used internally, the macro will not work as you would expect.
Consider for instance the following expression:

.. code-block:: scheme

 > (let ((unless 'unless))
    (less-dirty-for i 1 3 (display i)))

If you try to run this, your system will go into an infinite loop!

The problem here is that shadowing ``unless`` in the outer scope
affects the inner working of the macro (I leave as an exercise to
understand what exactly is going on).
As you can easily see, this
kind of problem is pretty tricky to debug: in practice, it
means that the macro user is forced to know all the identifiers
that are used internally by the macro.

On the other hand,
if you use hygienic macros, all the subtle
problems I described before simply disappear and
you can write a clean for loop just as it should be written:

.. code-block:: scheme

   (def-syntax (clean-for i i1 i2 body1 body2 ...)
     #'(let ((start i1) (stop i2))
        (let loop ((i start))
          (unless (>= i stop) body1 body2 ... (loop (+ 1 i))))))

Everything works fine with this definition, and the macro looks
even better, with less commas and splices ;-)

That's all for today. The next episode will discuss how to break
hygiene on purpose, don't miss it!

.. _Beautiful Code: http://oreilly.com/catalog/9780596510046/