About tail call optimization (and the module system)

In order to speak of Scheme performance one needs to write at least a few benchmarks. This is not completely trivial, for at least a couple of reasons. The first reason is shallow, but it may be baffling for beginners: since there is no for loop in Scheme, how are we supposed to write a benchmark, which is usually based on running many times the same instructions and measuring the total time spent? We will see in a moment that there is an easy solution to this question (use recursion). On the other hand, there is a second more serious question: since there is no fully portable way to write a library in Scheme, how can we write a benchmark library? There is no real answer, so we will restrict ourselves to R6RS-compliant Scheme implementations where there is a standard concept of library.

There are no for loops in Scheme

The for loop is missing in Scheme as a primitive construct since it is useless in a language that guarantees tail call optimization. If you studied the concept of tail call at college you know what I am talking about; on the other hand, if you forgot it, or if you did study Physics like me, it is worth spending a couple of words on the subject. The ones who want to know more, may consult this Wikipedia article.

_images/Ouroboros.png

The important point beyond the tail recursion concept is that it is always possibile to convert a for into a recursive function in tail call form, i.e. a recursive function returning a value or a call to itself. For instance, the Python loop:

# a loop printing 1 2 3
for i in range(1,4):
   print i,

can be converted to a recursive function print_up_to_3:

def print_up_to_3(i):
   if i == 4: return
   print i,
   return print_up_to_3(i+1)

print_up_to_3(1)

Here the last instruction of the function (the tail) is a call to itself, hence the name tail call.

Tail call optimization is guaranteed by the Scheme language. Scheme compilers/interpreters are able to recognize recursive functions in tail call form and to convert them internally in for loops. As a consequence, the programmer has no need to write for loops directly: she can just use recursive function. Our example would look as follows in Scheme:

(define (print-up-to-3 i)
   (unless (= i 4)
     (display i) (display " ")
     (print-up-to-3 (+ i 1))))

(print-up-to-3 1)

This works, but it is not really readable; to improve the situation Scheme provides a little syntactic sugar called named let:

(let loop ((i 1))
  (unless (= i 4)
     (display i) (display " ")
     (loop (+ i 1))))

Traditionally the function in the named let construct is called loop to make clear to the programmer that we are emulating a for loop. In this example loop is exactly equivalent to print-up-to-3.

Let me point out two things before closing this paragraph:

  1. there are other let forms, used to define local variables. The simplest one is let:

    > (let ((a 1) (b 2)) (+ a b)) ; => 3

    The scope of a and b is limited to the current S-expression/form; if a and b are defined outside the let form, internally a and b shadow the outer names.

  2. there is actually a do loop in the language, but it is cumbersome to use and redundant because the named let allows you to perform everything do does. I see it as a useless construct in a language that would like to be minimalist but it is not.

There is no portable module system

As I have anticipated, libraries are the weak point of Scheme. There are few libraries available and it is also difficult to write portable ones. The reason is that historically Scheme never had any standard module system until very recently, with the R6RS document: that means nearly all current implementations provide different and incompatible module systems.

In order to understand the reason for this serious lack, you must understand the philosophy behind Scheme, i.e. the so called MIT approach: things must be done well, or not at all. For thirty years the Scheme community has not been able to converge on a well done single module system. It is only in 2007 that a standard module system has been blessed by the Scheme committee: but even that was done with a lot of opposition and there are implementors who said they will never support R6RS.

As a consequence of history and mentality, if you want to write a library for implementation X, you need to do a lot of boring and uninspiring work to port the library to implementations Y, Z, W, ... (there are dozens of different implementations). Moreover, a few implementations do not have a module system at all, so you may be forced to solve name clashes issue by hand, changing the names of the functions exported by your library, if they shadow names coming from third party libraries (!)

Personally, I picked up Scheme 5 years ago, but never used it because of the lack of a standardized module system. The main reason why I have decided to go back to Scheme and to write this series is the coming of the R6RS document last year. The R5RS standard has lots of defects, but at least now I can write a library and I can have people using different implementations install it and use it (nearly) seemlessly.

Since there is some hope for a large diffusion of R6RS module system in the future, I have decided to use it and to ignore implementations not supporting it. I should notice however that there are solutions to run R6RS modules on top of R5RS implementations, like the psyntax package, but I have not tried it, so I cannot comment on its reliability.

As first example of usage of the R6RS module system, I will define a repeat library exporting a call function which is able to call a procedure n times. Here is the code, that should be self-explanatory:

(library (repeat)
  (export call)
  (import (rnrs))

  (define (call n proc . args)
    (let loop ((i 0))
      (when (< i n) (apply proc args) (loop (+ 1 i))))))

The export declaration corresponds to Python’s __all__: only the names listed in export are exported. In this case we will export only the function (call n proc . args). Notice the dot in the argument list: that means that the functions accept a variable number of arguments, which are collected in the list args. In other words, . args is the moral equivalent of *args in Python, with some difference that we will ignore for the moment. The apply function applies the argument list to the input procedure proc, which is called many times until the index i reaches the value n.

(import (rnrs)) imports all the libraries of the current version of the “Revised Report on Scheme”, i.e. the R6RS report. At the REPL this is automatically done by the system, but for batch scripts it is mandatory (as Pythonistas say explicit is better than implicit). It is also possible to import subsections of the whole library. For instance (import (rnrs base)) imports only the base library of the R6RS, (import (rnrs io)) imports only the I/O libraries, et cetera.

The usage of the libray is trivial: it is enough to put the file repeat.sls somewhere in the Ikarus search path (specified by the environment variable IKARUS_LIBRARY_PATH). Then, you can import the library as follows:

$ rlwrap ikarus
Ikarus Scheme version 0.0.2
Copyright (c) 2006-2007 Abdulaziz Ghuloum
> (import (repeat))
> (call 3 display "hello!\n")
hello!
hello!
hello!

By default (import (repeat)) imports all the names exported by the module repeat, something that a Pythonista would never do (it is equivalent to a import * from repeat); fortunately it is possible to list the names to be imported, or to add a custom prefix:

> (import (only (repeat) call)); import only call from repeat
call
#<procedure call>
> (import (prefix (repeat) repeat:)); import all with prefix repeat:
> repeat:call
#<procedure call>

A simple benchmark

The main advantage of Scheme with respect to Python is the performance. In order to show the differences in performance I will go back to the factorial example of episode 4. I will compare the following Python script:

# fact.py
import sys, timeit

def fact(x):
    if x == 0: return 1
    else: return x * fact(x-1)

if __name__ == '__main__':
    n = int(sys.argv[-1])
    t = timeit.Timer('fact(%d)' % n, 'from fact import fact')
    print t.repeat(1, number=10000000)
    print fact(n)

with the following R6RS-compliant script (written in Ikarus Scheme):

; fact.ss
(import (rnrs) (only (repeat) call) (only (ikarus) time))

(define (fac x)
  (if (= x 0) 1
      (* x (fac (- x 1)))))

(define n
  (string->number (car (reverse (command-line)))))

(time (call 10000000 (lambda () (fac n))))
(display "result:") (display (fac n)) (newline)
_images/clessidra.png

I will notice two things:

  1. Python manages to compute the factorial of 995, but then it faces the stack wall and it raises a RuntimeError: maximum recursion depth exceeded whereas Scheme has no issues whatsoever;
  2. In order to compute the factorial of 995 ten thousands times, Python takes 15.2 seconds, whereas Ikarus takes 7.2 seconds.

Notice that since the factorial of 995 is a large number, the computation time is spent in multiplication of large numbers, which are implemented in C. Python has its own implementation of long integers, whereas Ikarus uses the GNU Multiprecision library (gmp): the times measured here mean that the gmp implementation of long integers is more efficient than the Python one, but they say nothing on the relative performances of the two languages. It is more interesting to see what happens for small numbers. For instance, in order to compute the factorial of 7 for 10 millions of times, Python takes 30.5 seconds, whereas Ikarus takes 3.08 seconds and thus it is nearly ten times faster than Python. This is not surprising at all, since function calls in Python are especially slow whereas they are optimized in Scheme. Moreover Ikarus is a native code compiler.

It means Ikarus’ REPL works by compiling expressions to native code, whereas Python’s REPL compiles to bytecode. The technology is called incremental compilation and it is commonly used in Lisp languages from decades, even if it may look futuristic for C/C++ programmers. The factorial example is not very practical (on purpose), but it is significant, in the sense that it is legitimate to expect good performances from Scheme compilers. The fastest Scheme compiler out there is Stalin, but I would not recommend it to beginners.

The next episodes will be devoted to the dangers of benchmarks, do not miss it!