Comparing identifiers ================================================= This is the last episode of part V of my Adventures. In the latest episodes I have discussed various technicalities of Scheme macros, such as the concepts of syntax object, hygiene and lexical context. There is still an important subject to be discuss in order to become really proficient with Scheme macros: *identifier equality*. Equality of identifiers is one of the trickiest things in Scheme. First of all, identifier equality is a compile-time concept which has nothing to do with the run-time concept of equality between variables. Identifiers are not variables: they are syntax objects with an underlying symbol and an underlying lexical context, which are known statically at compile time. It is possible to know if an identifier is bound or not at compile-time, but the value the identifier will be bound to at run-time is (in general) unknown. Secondly, there is not a single concept of identifier equality, but different definitions are possible. In this episode I will discuss three different predicates to compare identifiers: ``symbol-identifier=?``, ``bound-identifier=?`` and ``free-identifier=?`` (the latter two are part of the R6RS standard). .. image:: identity.png .. _22: http://www.artima.com/weblogs/viewpost.jsp?thread=256848 ``symbol-identifier=?`` ----------------------------------------------- The simplest concept of identifier equality is expressed by the following ``symbol-identifier=?`` comparison function (for convenience, I have added the ``symbol-identifier=?`` precedure to the ``(aps lang)`` library): :: (define (symbol-identifier=? id1 id2) (symbol=? (syntax->datum id1) (syntax->datum id2))) Two identifiers are ``symbol-identifier=?`` if they are equal as symbols, once their lexical information has been stripped out. For instance, ``symbol-identifier=?`` can be used to find duplicated names in macros defining ``name->value`` tables, such as the ``static-map`` macro I discussed in episode 22_. Moreover, ``symbol-identifier=?`` can be used to reject reserved identifiers (you may need such functionality if are building a mini-language on top of Scheme and you want to reject a few identifiers as language keywords), as in the following example: .. code-block:: scheme (def-syntax (check-reserved id) (syntax-violation 'check-reserved "Reserved identifier" #'id) (exists (cut symbol-identifier=? #'id <>) (list #'reserved1 #'reserved2)) 'non-reserved) ``(check-reserved id)`` will raise a syntax-violation if ``id`` is one of the keyword ``reserved1`` or ``reserved2``. ``bound-identifier=?`` ----------------------------------------------- ``symbol-identifier=?`` is simple and easy to understand, but it cannot be used in all situations. Consider for instance the very first macro I wrote, in episode 9_: .. code-block:: scheme (def-syntax (multi-define (name1 name2 ...) (value1 value2 ...)) #'(begin (define name1 value1) (define name2 value2) ...)) It is quite common to write macros defining multiple bindings, such as ``multi-define``. ``multi-define`` as written does not perform any check for duplicated identifiers, so that it relies on the standard behavior of R6RS scheme, raising an error. However, the standard behavior only applies to programs and scripts, whereas the REPL is quite free to behaves differently and indeed it does in most implementations: .. code-block:: scheme > (multi-define (a a) (1 2)); in Ikarus, Ypsilon, ... a 2 (in the REPL latter definitions override previous definitions). If you are unhappy with that, you can introduce a ``bound-identifier=?`` check and raise a custom exception: .. code-block:: scheme (def-syntax (multi-define (name1 name2 ...) (value1 value2 ...)) #'(begin (define name1 value1) (define name2 value2) ...) (distinct? bound-identifier=? #'(name1 name2 ...)) (syntax-violation 'multi-define "Found duplicated identifier in" #'(name1 name2 ...))) Two identifiers are equal according to ``bound-identifier=?`` only if they have the same name *and* the same marks. The name is misleading since the arguments of ``bound-identifier=?`` are not required to be bound identifiers; a better name would be ``strict-identifier=?``. You can ``check`` that ``multi-define`` correctly reject duplicated identifiers: .. code-block:: scheme > (multi-define (a a) (1 2)) Unhandled exception Condition components: 1. &who: multi-define 2. &message: "Found duplicated identifier in" 3. &syntax: form: (a a) subform: #f In this simple example using ``symbol-identifier=?`` would work too. However this is not the geneal case. Consider for instance the following macro expanding to ``multi-define``: .. code-block:: scheme (def-syntax (multi-define2 id value) #'(multi-define (id id2) (value 'dummy))) ``multi-define2`` introduces a dummy identifier ``id2``. Had we defined ``multi-define`` in terms of ``symbol-identifier=?``, calling ``multi-define2`` with argument ``id`` equal to ``id2`` would have generated a spurious name clash. Fortunately, since we defined ``multi-define`` in terms of ``bound-identifier=?``, nothing bad happens: .. code-block:: scheme > (multi-define2 id2 1) id2 1 ``bound-identifier=?`` works in this case because the identifier ``id2`` introduced by the macro has different marks from the identifier ``id2`` coming as an argument. ``bound-identifier=?`` is not good for every circumstance. Consider for instance the following variation of ``multi-define``, featuring a literal keyword ``as``: .. code-block:: scheme ;; not checking for duplicated identifiers here (define-syntax multi-def (syntax-rules (as) ((multi-def (name as value) ...) (begin (define name value) ...)))) This work, but the error messages could stand some improvement. For instance, if an user misspells the infix identifier ``as``, she gets a generic ``"invalid syntax"`` error, whereas we would like to provide a customized error message showing the misspelled literal identifier. Using ``bound-identifier=?`` we could try to solve the problem as follows: .. code-block:: scheme (def-syntax (multi-def-bad (name as_ value) ...) #'(begin (define name value) ...) (for-all (lambda (id) (when (not (bound-identifier=? id #'as)) (syntax-violation 'multi-def-bad "Offending infix syntax (expected `as')" id))) #'(as_ ...))) Unfortunately this solution does not work at all, since it raises an error even when the ``as`` identifiers are spelled correctly: .. code-block:: scheme > (multi-def-bad (x as y) (1 as 2)) Unhandled exception Condition components: 1. &who: multi-def-bad 2. &message: "Offending infix syntax (expected `as')" 3. &syntax: form: as subform: #f 4. &trace: # The reason is that ``as`` is not ``bound-identifier=?`` to ``#'as``. We need a less strict comparison predicate. To this aim the Scheme standard provides another equality procedures for identifiers, ``free-identifier=?``, which however is not quite right. ``free-identifier=?`` ----------------------------------------- ``free-identifier=?`` is the most complicated equality predicate. I find its `description in the R6RS document`_ particularly confusing and the name is misleading since the arguments of ``free-identifier=?`` are not required to be free identifiers. A better name would be ``lax-identifier=?``. Two identifiers are ``free-identifier=?`` if 1. they are both bound to the same binding and they share the same name (or they shared the same name before renaming at import time); 2. they are both unbound and they share the same name. In implementations with full phase separation, the identifiers must also be both bound/unbound in the same phase. In all other cases the two identifiers are not ``free-identifier=?``. Here is an example: .. code-block:: scheme > (import (only (aps list-utils) range)) > (import (rename (aps list-utils) (range r))) > (free-identifier=? #'r #'range) #t Notice that both ``symbol-identifier=?`` and ``bound-identifier=?`` would fail to recognize the identity of ``range`` and ``r`` in this case. It is important to know about ``free-identifier=?`` because in macros with literal identifiers the literal identifiers are compared by using it, internally. That explain a behavior which can be quite surprising. .. _9: http://www.artima.com/weblogs/viewpost.jsp?thread=240804 .. _description in the R6RS document: http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-13.html#node_idx_1142 Literal identifiers and auxiliary syntax ----------------------------------------------------- Consider the macro ``multi-def`` defined in the previous paragraph. This works: .. code-block:: scheme > (let () (multi-def (x as 1) (y as 2)) (list x y)) (1 2) But this does not work: .. code-block:: scheme > (let ((as 2)) (multi-def (x as 1) (y as 2)) (list x y)) Unhandled exception Condition components: 1. &message: "invalid syntax" 2. &syntax: form: (multi-def (x as 1) (y as 2)) subform: #f 3. &trace: # That looks surprising, but it is not once you realize that internally literal identifiers are compared via ``free-identifier=?``. In the second example ``as`` is bound, and therefore it is not ``free-identifier=?`` to the literal identifier ``#'as``, which is unbound. The recommended "solution" is to introduce at top level some dummy definitions for the literal identifiers you are going to use in your macro, and to export them. Following this policy, the R6RS document defines a set of special macros, ``_``, ``...``, ``else`` and ``=>``, which lives in the global namespace and are available to all R6RS programs. Such macros are used as auxiliary syntax in various special forms, like ``cond`` and ``syntax-case``; for this reason they are usually called auxiliary keywords. The existence of such global variables makes it impossible to redefine them at top-level in scripts (but it can be done at the REPL); however they can be redefined locally, thus breaking the macros using the auxiliary syntax: .. code-block:: scheme > (let ((else #f)) (cond (else 'something))) > ; does not return something I think this is fundamentally broken: literal identifiers should be a concept internal to the macro and they should not be exported. The mistake is that the R6RS requires the literal identifiers to be matched via ``free-identifier=?``, whereas they should be matched with ``symbol-identifier=?``. I never understood why the editors decided to use ``free-identifier=?``, perhaps because it makes it possible to rename the identifiers used as literal identifiers, a feature that looks of little utility to me. All in all, I think ``free-identifier=?`` is another dark corner of R6RS Scheme.