Symbols and lists
=====================================================

In this episode I pave the way to the heart of Lisp, i.e. to the
famous *code is data* concept. In order to do that, I will have to
introduce two fundamental data types first: symbols and lists.

Symbols
------------------------------------------------------------------

Scheme and Lisp have a particular data type which is missing in most
languages (with the exception of Ruby): the *symbol*.

.. image:: symbols.jpg
   :height: 250

From the grammar point of view, a symbol is just a quoted identifier,
i.e. a sequence of characters corresponding to a valid identifier
preceded by a quote. For instance, ``'a``, ``'b1`` e ``'c_`` are
symbols.  On the set of symbols there is an equality operator ``eq?``
which is able to determine if two symbols are the same or not::

 > (define sym 'a)
 > (eq? sym 'b)
 #f
 > (eq? sym 'a)
 #t

``#f`` e ``#t`` are the Boolean values False and True respectively, 
as you may have imagined. The equality operator is extremely
efficient on symbols, since the compiler associates to every
symbol an integer number (this operation is called *hashing*) and stores
it in an interal registry
(this operation is called *interning*): when the compiler checks the identity
of two symbols it actually checks the equality of two integer
numbers, which is an extremely fast operation.

You may get
the number associated to a symbol with the function ``symbol-hash``::

 > (symbol-hash sym)
 117416170
 > (symbol-hash 'b)
 134650981
 > (symbol-hash 'a)
 117416170

It is always possible to convert a string into a symbol and viceversa
thanks to the functions ``string->symbol`` and ``symbol->string``,
however conceptually - and also practically - symbols in Scheme
are completely different from strings.

The situation is not really different in Python. It is true that
symbols do not exist as a primitive data type, however strings
corresponding to names of Python objects are actually treated as
symbols. You can infer this from the documentation about the builtin
functions hash_ e intern_, which says: *normally, the names used in
Python programs are automatically interned, and the dictionaries used
to hold module, class or instance attributes have interned keys*.
BTW, if you want to know exactly how string comparison works in Python
I suggest you to look at `this post`_:

.. _hash: http://docs.python.org/lib/built-in-funcs.html#l2h-36
.. _intern: http://docs.python.org/lib/non-essential-built-in-funcs.html#l2h-90

Scheme has much more valid identifiers than Python or C, where
the valid characters are restricted to ``a-zA-Z-0-9_`` (I am ignoring
the possibility of having Unicode characters in identifiers, which is
possible both in R6RS Scheme and Python 3.0). By convention, symbols
ending by ``?`` are associated to boolean values or to boolean-valued
functions, whereas symbols ending by ``!`` are associated to functions
or macros with side effects.

.. _this post: http://groups.google.com/group/comp.lang.python/msg/f9d56d969572f2e1

The function ``eq?``,
is polymorphic and works on any kind of object, but it may surprise you
sometimes::

 > (eq? "pippo" "pippo")
 #f

The reason is that ``eq?`` (corrisponding to ``is`` in Python) checks
if two objects are the same object at the pointer level, but it does
not check the content. Actually, Python works the same. It is only by
accident than ``"pippo" is "pippo"`` returns
True on my machine, since the CPython implementation
manages differently "short" strings from "long" strings::

 >>> "a"*10 is "a"*10 # a short string
 True
 >>> "a"*100 is "a"*100  # a long string
 False

If you want to check if two objects have the same content you should
use the function ``equal?``, corresponding to ``==`` in Python::

 > (equal? "pippo" "pippo")
 #t

It you know the type of the objects you can use more efficient
equality operators; for instance for strings you can use ``string=?``
and for integer numbers ``=``::

 > (string=? "pippo" "pippo")
 #t

 > (= 42 42)
 #t

To be entirely accurate, in addition to ``eq`` and ``equal``, Scheme
also provides a third equality operator ``eqv?``. ``eqv?`` looks to me
like an useless complication of the language, so I will not discuss
it. If you are interested, you can read what the R6RS document `says
about it`_.

.. _says about it: http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-14.html#node_idx_428

Lists
-------------------------------------------------------------------

The original meaning of LISP_ was *List Processing*, since lists were the
fundamental data type of the language. Nowadays Lisp implements
all possible data types, but still lists have a somewhat privileged
position, since lists can describe code.
A Lisp/Scheme list is a recursive data type such that:

1. the list is empty: ``'()``
2. the list is the composition of an element and a list via the
   ``cons`` operation (``cons`` stands for *constructor*): ``(cons elem lst)``.

.. _LISP: http://www.acronymfinder.com/af-query.asp?acronym=LISP

For instance, one-element lists are obtained as composition of an
element with the empty list::

 > (cons 'x1 '()); one-element list
 (x1)

Two-elements lists are obtained by composing an element with a one-element
list::

 > (cons 'x1 (cons 'x2 '())); two-element list
 (x1 x2)

That generalizes to N-element lists::

 > (cons 'x1 (cons 'x2 (cons 'x3 ..... (cons 'xN '()))) ...)
 (x1 x2 x3 ... xN)

For simplicity, the language provides an N-ary list constructor ``list``

::

 > (list x1 x2 x3 ... xN)
 (x1 x2 x3 ... xN)

but the expression ``(list x1 ... xN)`` is nothing else
than syntactic sugar for the fully explicit expression in terms of
constructors.

There are also the so-called *improper lists*, i.e. the ones where the
second argument of the ``cons`` is not a list. In that case the
representation of the list displayed at the REPL contains a dot::

 > (cons 'x1 'x2) ; improper list
 (x1 . x2)

It is important to remember that improper lists *are not lists*,
therefore operations like ``map``, ``filter`` and similar do not work
on them.

As we anticipated in `episode 4`_, the first element of a list (proper
or improper) can be extracted with the function ``car``; the tail of
the list instead can be extracted with the function ``cdr``.
If the list is proper, its ``cdr`` is proper::

 > (cdr (cons 'x1 'x2))
 x2

 > (cdr (cons 'x1 (cons 'x2 '())))
 (x2)

At low level Scheme lists are implemented as *linked list*, i.e. as
couples *(pointer-to-sublist, value)* until you arrive at the null
pointer.

.. image:: linkedlists.jpg

Some example
-----------------------------------------------------

To give an example of how to build Scheme lists, here I show you
how you could define a ``range`` function analogous to Python ``range``.
Here are the specs::

 > (range 5); one-argument syntax
 (0 1 2 3 4)
 > (range 1 5); two-argument syntax
 (1 2 3 4)
 > (range 1 10 2); three-argument syntax
 (1 3 5 7 9)
 > (range 10 0 -2); negative step
 (10 8 6 4 2)
 > (range 5 1); meaningless range
 ()

Here is the implementation:

.. code-block:: scheme

   (define range 
    (case-lambda 
      ((n); one-argument syntax
       (range 0 n 1))
      ((n0 n); two-argument syntax
       (range n0 n 1))
      ((n0 n s); three-argument syntax
       (assert (and (for-all number? (list n0 n s)) (not (zero? s))))
       (let ((cmp (if (positive? s) >= <=)))
         (let loop ((i n0) (acc '()))
           (if (cmp i n) (reverse acc)
               (loop (+ i s) (cons i acc))))))))

Here ``case-lambda`` is a syntactic form that allows to define
functions with different behavior according to the number of arguments.
``for-all`` is an R6RS higher order function: ``(for-all pred
lst)`` applies the predicate ``pred`` to the elements of list ``lst``,
until a false value is found - in that case it returns ``#f`` -
otherwise it returns ``#t``. Here the assertion checks at
runtime that all the passed arguments ``n0``, ``n`` and ``s`` are
numbers, with ``s`` non-zero.

The first ``let`` defines a comparison
function ``cmp`` which is ``>=`` if the step ``s`` is positive, or
``<=`` if the step ``s`` is negative. The reason is clear: if ``s``
is positive at some point the index ``i`` will get greater than
``n``, whereas if ``s`` is negative at some point ``i`` will get
smaller than ``n``.

The trick used in the loop is extremely common: instead of modifying
a pre-existing list, at each iteration a new list is created *ex-novo*
by adding an element ``(cons i acc)``; at the end the accumulator
is reversed ``(reverse acc)``. The same happens for the counter ``i``
which is never modified.

This is an example of *functional* loop;
imperative loops based on mutation are considered bad style in Scheme
and other functional languages. Notice by contrast that in Common Lisp
imperative loops are perfectly acceptable and actually
there is a very advanced LOOP_ macro allowing you to do everything
with imperative loops.

Actually, the ``range`` function defined here is more powerful than
Python's ``range``, since it also works with floating point numbers::

 > (range 1.3 2.5 .25)
   (1.3 1.55 1.8 2.05 2.3)

As always, the only way to really get Scheme lists is to use them.
I suggest you try the following exercises:

1. implement an equivalent of Python enumerate_ for Scheme lists;
2. implement an equivalent of Python zip_ for Scheme lists.

I will show the solutions in the next episode. If you are lazy and you
want an already written list library, I suggest you to give a look at the
SRFI-1_ library, which is very rich and available practically in all
Scheme implementations. Many of the SRFI-1_ features are built-in in
the R6RS standard, but many other are still available only in the SRFI-1_ .

.. _LOOP: http://www.ai.sri.com/pkarp/loop.html
.. _enumerate: http://www.python.org/doc/lib/built-in-funcs.html#l2h-24
.. _zip: http://www.python.org/doc/lib/built-in-funcs.html#l2h-81
.. _SRFI-1: http://srfi.schemers.org/srfi-1/srfi-1.html
.. _episode 4: http://www.artima.com/weblogs/viewpost.jsp?thread=239568