Breaking hygiene

In the previous episode I said that hygienic macros are good, since they solve the variable capture problem. However, purely hygienic macros introduce a problem of their own, since they make it impossible to introduce variables at all. Consider for instance the following trivial macro:

(def-syntax (define-a x)
  #'(define a x))

(define-a x) apparently expand to (define a x), so you may find the following surprising:

> (define-a 1)
> a
Unhandled exception
 Condition components:
   1. &undefined
   2. &who: eval
   3. &message: "unbound variable"
   4. &irritants: (a)

Why is the variable a not bound to 1? The problem is that hygienic macros never introduce identifiers implicitly. Auxiliary names introduced in a macro are not visible outside and the only names which enter in the expansion are the ones we put in. A mechanism to introduce identifiers, i.e. a mechanism to break hygiene, is needed if you want to define binding forms.

datum->syntax revisited

Scheme has a builtin mechanism to break hygiene, and we already saw it: it is the datum->syntax utility which converts literal objects (datums) into syntax objects. I have shown datum->syntax at work in episodes 27 and 28 : it was used there to convert lists describing source code into syntax objects. A more typical use case for datum->syntax is to turn symbols into proper identifiers. Such identifiers can then be introduced in macros and made visible to expanded code.

In order to understand the mechanism, you must always remember that identifiers in Scheme - in the technical sense of objects recognized by the identifier? predicate - are not just raw symbols, they are syntax objects with lexical information attached to them. If you want to turn a raw symbol into an identifier you must add the lexical information to it, and this is done by copying the lexical information coming from the context object in datum->syntax.

For instance, here is how you can “fix” the macro define-a:

(def-syntax (define-a* x)
  #`(define #,(datum->syntax #'define-a* 'a) x))

The symbol 'a here is being promoted to a bona fide identifier, by adding to it the lexical context associated to the macro name. You can check that the identifier a is really introduced as follows:

> (define-a* 1)
> a

A more realistic example is to use syntax->datum to build new identifiers from strings. For that purpose I have added an identifier-append utility in my (aps lang) library, defined as follow:

;; take an identifier and return a new one with an appended suffix
(define (identifier-append id . strings)
  (define id-str (symbol->string (syntax->datum id)))
  (datum->syntax id (string->symbol (apply string-append id-str strings))))

Here is a simple def-book macro using identifier-append:

(def-syntax (def-book name title author)
  (with-syntax (
     (name-title (identifier-append #'name "-title"))
     (name-author (identifier-append #'name "-author")))
         (define name (vector title author))
         (define name-title (vector-ref name 0))
         (define name-author (vector-ref name 1)))))

def-book here is just as an example of use of identifier-append, it is not as a recommended pattern to define records. There are much better ways to define records in Scheme, as we will see in part VI of these Adventures.

Anyway, def-book works as follows. Given a single identifier name and two values it introduces three identifiers in the current lexical scope: name (bound to a vector containing the two values), name-title (bound to the first value) and name-author (bound to the second value).

> (def-book bible "The Bible" "God")
> bible
#("The Bible" "God")
> bible-title
"The Bible"
> bible-author

Playing with the lexical context

The lexical context is just the set of names which are visible to an object in a given lexical position in the source code. Here is an example of a lexical context which is particularly restricted:

(library (experimental dummy-ctxt)
 (export dummy-ctxt)
 (import (only (rnrs) define syntax))
 (define dummy-ctxt #'here)

The identifier #'here only sees the names define, syntax and dummy-ctxt: this is the lexical context of any object in its position in the source code. Had we not restricted the import, the lexical context of #'here would have been the entire rnrs set of identifiers. We can use dummy-ctxt to expand a macro into a minimal context. Here is an example of a trivial macro expanding into such minimal context:

> (import (experimental dummy-ctxt))
> (def-syntax expand-to-car
   (lambda (x) (datum->syntax dummy-ctxt 'car)))

The macro expand-to-car expands to a syntax object obtained by attaching to the symbol 'car the lexical context dummy-ctxt. Since in such lexical context the built-in car is not defined, the expansion fails:

> (expand-to-car)
 Unhandled exception
  Condition components:
    1. &undefined
    2. &who: eval
    3. &message: "unbound variable"
    4. &irritants: (car)

A similar macro expand-to-dummy-ctxt instead would succeed since dummy-ctxt is bound in that lexical context:

> (def-syntax expand-to-dummy-ctxt
    (lambda (x) (datum->syntax dummy-ctxt 'dummy-ctxt)))
>  (expand-to-dummy-ctxt)
#<syntax here [char 115 of /home/micheles/gcode/scheme/aps/dummy-ctxt.sls]>

In the definition of define-macro I gave in episode 28 I used the name of the defined macro as lexical context. The consequence of this choice is that define-macro style macros are expanded within the lexical context of the code where the macro is invoked. For instance in this example

> (let ((x 42))
   (define-macro (m) 'x) ; (m) should expand to 42
   (let ((x 43))
43 ; surprise!

(m) expand to 43 since in the lexical context where the macro is invoked x is bound to 43. However, this behavior is quite surprising, and most likely not what it is wanted. This is actually another example of the free-symbol capture problem. It should be clear that the capture comes from expanding the macro in the macro-call context, not in the macro-definition context.

Hygienic vs non-hygienic macro systems

Understanding non-hygienic macros is important if you intend to work in the larger Lisp world. In the scheme community everybody thinks that hygiene is an essential feature and all major Scheme implementations provide hygienic macros; nevertheless, in the rest of the world things are different.

For instance, Common Lisp does not use hygienic macros and it copes with the variable capture problem by using gensym; the free symbol capture problem is not solved, but it is extremely rare, because Common Lisp has multiple namespaces and a package system.

The hygiene problem is more serious in Lisp-1 dialects like the newborns Arc and Clojure. Arc macros behave just like define-macro and are fully unhygienic, whereas Clojure macros are rather special, being nearly hygienic. In particular Clojure macros are not affected by the free-symbol capture problem:

user=> (defmacro m[x] `(list ~x))
user=> (let [list 1] (m 2))

The reason is that Clojure is smart enough to recognize the fully qualified list object appearing at macro definition time (clojure.core/list) as a distinct object from the local variable list bound to the number 1. Moreover, the ordinary capture problem can be solved with gensym or even cooler feature, automatic gensyms (look at the documentation of the syntax-quote reader macro if you want to know more). Speaking as a non-expert, Clojure macros seem to fare pretty well with respect to the hygiene issue.

It is worth mentioning that if you use a package system (like in Common Lisp) or a namespace system (like in Clojure) in practice variable capture becomes pretty rare. In Scheme instead, which uses a module system, hygiene is essential: if you are writing a module containing macros which can be imported and expanded in an unknown lexical scope, in absence of hygiene you could introduce name clashes impossible to foresee in advance, and that could be solved only by the final user, which however will likely be ignorant of how your library works.

This is why in Scheme the macro expansion is not literally inserted in the original code, and a lot of magic takes place to avoid name clashes. In practice, the implementation of Scheme macros takes care of distinguishing the introduced identifiers with some specific mechanism (it could be based on marking the names, or on explicit renaming). As a consequence, the mechanism of macro expansion is less simple to explain: you cannot just cut and paste the result of the expansion in your original code.

Personally I have made my mind up and I am in the pro-hygiene camp now. I should admit that for a long time I have been in the opposite camp, preferring the simplicity of define-macro over the complexity of syntax-case. It turns out I was wrong. The major problem of syntax-case is a cosmetic one: it looks very complex and cumbersome to use, but that can be easily solved by providing a nicer API - which I did with sweeet-macros. Actually I have been able to use sweet-macros for twenty episodes without explaining the intricacies of the hygienic expansion.

Having attended to a talk on the subject at the EuroLisp Symposium, I will mention here that there are ways to implement hygienic macros on top of defmacro in Common Lisp portably. Therefore there is no technical reason why hygienic macros are not widespread in the whole Lisp world, just a matter of different opinions on the importance of the problem and the different tradeoffs. I believe that eventually all Lisp dialects will start using hygienic macros, but that could take decades, because of inertia and backward-compatibility concerns.