Thursday, April 24, 2008

Pattern matching and generic functions

I've been working some more on Shlisp. The interpreter is becoming reasonably complete in terms of implementing the core language functionality. It now uses Boehm GC to implement garbage collection, and setjmp, longjmp and stack copying to implement continuations (effectively by adding continuations to C++ itself, using the technique described here).

Now that the interpreter is fairly reliable, I've been writing some actual Shlisp code. This post is about the pattern-matching and generic function features. Shlisp makes use of the same pattern matcher for three purposes: ordinary pattern matching, lambda argument lists and generic function dispatch. Although this means that the pattern-matching syntax isn't perfectly optimized for any one of these domains, I think the reduction in complexity is adequate compensation. The basic pattern matcher does more or less what you would expect. Here's an example of destructuring a sequence:

(match '(1 2 3 4)
(() ())
((x y | zs) zs))

=> (3 4)

The ‘|’ is used to introduce a pattern which must match each element of the rest of a sequence. This pattern can itself be complex, so it is easy to define a function which accepts (say) a single argument followed by a list of pairs:

(defun foo args
(match args
((x | (fst snd)) (some-more-code ...))))

By convention, pattern-matching derivatives of definition macros have a colon appended to their names. For example the previous function can be defined more concisely as follows:

(defun: foo
((x | (fst snd)) (some-more-code ...)))

The ‘match’ macro returns no value if none of the specified patterns match, but an error will be raised if none of the patterns for a ‘defun:’ match. The following syntax is used to match on the type of a value:

(defun: as-integer
(((x : integer) x))
(((x : double) (floor x))))

This syntax extends to the definition of simple methods for generic functions. A generic function is defined like this:

(defgeneric generic-add)

And methods are added like this:

(defmethod generic-add ((x : number) (y : number))
(+ x y))
(defmethod generic-add ((x : string) (y : string))
(concat x y))

The ‘number’ type is a supertype of all numeric types. As befits a generic function system, we can add an additional method for a subtype of the ‘number’ type:

(defmethod generic-add ((x : integer) (y : integer))
(+ 10 x y)) ; Silly (but different) behavior.

In the case of generic functions with overlapping argument types, the dispatch algorithm selects the method with the most specific argument list, going from left to right along the arguments. The ‘defmethod’ form is only able to dispatch on fixed numbers of arguments, but there is also a ‘defmethod:’ form which integrates pattern matching with generic dispatch. This form accepts an arbitrary pattern, but still ensures that methods with overlapping argument types are dispatched appropriately. In the case of ‘generic-add’, we can allow each method to receive any number of arguments:

(defmethod: generic-add ((| (xs : number)))
(apply + xs))

As I mentioned in the previous post, it is possible to add generic dispatch to a builtin, non-generic function. For example, if we wanted to extend ‘+’ so that it could be used to concatenate symbols, the following code would do the trick:

(add-generic +)

(defmethod: + ((| (xs : symbol)))
(apply concat xs))

The generic dispatch mechanism is activated only if the builtin addition operation receives bad arguments, so the performance of ordinary arithmetic operations needn't suffer too much.