Thursday, September 4, 2008

Now it starts getting interesting.

I have a Scheme system that can interpret the MIT-Scheme S-Code, but it doesn't use the underlying MIT-Scheme runtime system. It can read the fasl binaries that MIT Scheme produces, but the object format is whatever the .NET system is providing. How does it perform? Not particularly well.

It's pretty easy to figure out why. Even the simplest piece of code is creating and invoking continuations at an enormous rate. Each continuation is being allocated on the heap. Now it is the case with modern generational garbage collection that heap allocation has almost no cost, but even the very tiny cost of allocation adds up quickly when you do it for nearly every node you traverse in the abstract syntax tree.

But the real problem is this: the CPU has specialized hardware that is specifically designed to efficiently allocate, deallocate and manipulate continuations, and we aren't using it. Again, it is sitting there managing the continuations used by the virtual functions in our interpreter, while we use the main data paths and the memory management software library to mimic it's functionality for the virtual machine. In fact, we go out of our way to get it to discard these useless continuations so we can preserve tail recursion.

Why do we do this? First, the C# continuations don't quite do what we want. C#, like most other languages, unconditionally allocates continuations on procedure entry and deallocates them on exit. Scheme, on the other hand, allocates continuations only if they logically needed to regain control. If a routine has completed its work and is simply going to call another routine to get the answer, it just jumps there and the callee transmits the answer directly back to whatever routine initiated the call chain.

Second, C# continuations are too opaque for our purposes. Scheme's call-with-current-continuation primitive reifies the continuation — it takes the stack and turns it into a first-class object that can be manipulated by the user. C#, like Java, does not permit programs to randomly access the stack and certainly doesn't let programs unload and reload the contents of the stack.

But there are techniques to get around these limitations.

Let me describe how I solve the first problem.

The EvalStep method in my interpreter originally had this signature:
void EvalStep (Interpreter interpreter);
and a node in the abstract systax tree would do something like this:
 sealed class Conditional : SCode, ISystemHunk3
 {
     readonly SCode predicate;
     readonly SCode consequent;
     readonly SCode alternative;

     override void EvalStep (Interpreter interpreter)
     {
         interpreter.Continuation =
             new ConditionalDecide (interpreter.Continuation, 
                                    this,
                                    interpreter.Environment));
         interpreter.Expression = this.predicate;
         return;
     }  }
Note how the new continutation takes as one of its arguments the existing continuation. We put that into the interpreter continuation register, set the interpreter expression register to the predicate of the conditional and then return control to the caller (which is the trampoline).

The point is to evaluate the predicate and pass the answer on to the newly created ConditionalDecide continuation. It has code like this:
override void Invoke (Interpreter interpreter, object value)
{
    interpreter.Continuation = this.parent;
    interpreter.Environment = this.environment;
    interpreter.Expression = (value is bool && (bool) value == false)
                               ? this.expression.Alternative
                               : this.Expression.Consequent;
    return;

These two code fragments are part of a single logical routine that is more easily written like this:
override void EvalStep (Interpreter interpreter)
{
    object value = this.predicate.EvalStep (interpreter);
    if (value is bool && (bool) value == false)
        return this.expression.Altenative.EvalStep (interpreter);
    else
        return this.expression.Consequent.EvalStep (interpreter);
}
This is not only easier to write, it performs much better because we are co-opting the continuations that are used for C# subroutines for our own sub-evaluations. Unfortunately, however, the tail calls to EvalStep are allocating a continuation that destroys tail recursion. We need to transfer control to one of these routines without allocating a new continuation.

Here's the trick: we'll get the caller to do the transfer for us. We'll return to the caller with some indication that we're not really done, but that we intend for someone else to finish our job. Essentially, we are taking some of the code that logically belongs in our routine and placing it in the caller. This isn't possible in general because the caller is unknown and the callees can be arbitrary, but in the context of the interpreter, we have an extraordinary special case: Nearly every call that is necessary to be tail recursive is a call to EvalStep. (There are a couple of primitives that call back into the interpreter that we'll deal with in a bit.)

We'll define a new calling convention that requires the caller to handle the transfer to EvalStep for us. It's a little unorthodox, but no more so than requiring every routine to return to a trampoline. The new signature of EvalStep will be this:
    bool EvalStep (out object answer, ref SCode expression);
The boolean return value indicates whether we've returned control with an answer or whether we've returned with a new expression we wish to tail call. The caller must invoke EvalStep like this:
    SCode expr = <expression to evaluate>;
    object answer;
    while (expr.EvalStep (out answer, ref expr)) { };
    // When the while exits, we have an answer.
And the callee should do one of two things. Either assign an answer and return false, or assign a new expression and return true. The EvalStep handler for the conditional expression above would therefore look like this:
override bool EvalStep (out object answer, ref SCode expression)
{
    SCode  expr = expression.Predicate;
    object predicateValue;
    while (expr.EvalStep (out predicateValue, ref expr)) { };

    if (predicateValue is bool && (bool) predicateValue == false)
        expression = expression.Altenative;
    else
        expression = expression.Consequent;

    return true;
}
This isn't quite the whole story. Evaluation needs an environment and that is missing. The EvalStep routines will need to provide this to the caller as well, so it will handled with a ref argument, too. Here is the code for conditionals:
bool EvalStep (out object answer, ref SCode expression, ref Environment environment)
{
    SCode expr = this.predicate;
    Environment env = environment;
    while (expr.EvalStep (out answer, ref expr, ref env)) { };

    expression = (answer is bool) && (bool) answer == false
        ? this.alternative
        : this.consequent;
    // request tail recursion.
    return true;
}
Here is the code for variables:
override bool EvalStep (out object value, ref SCode expression, ref Environment environment)
{
    if (environment.DeepSearch (out value, this.name))
        throw new UnboundVariableException (this.name);
    // actually returning a value
    return false;
}
If you look at this code you'll notice that what we are doing is severely abusing the language features to change the calling conventions. We allocate some stack for EvalStep in the caller's frame. We then pass control to EvalStep and give it the addresses of the allocated variables. EvalStep eventually returns control to us, but it isn't logically finished. Recall that we've moved some of the code in the callee to the caller. If the return value is true, the while loop performs a method call to EvalStep on the new value of expr. This is the call that the callee would have performed, if he was allowed to clear the stack prior to the call. Since he is not allowed to do this, we just move the code around a little so that it runs after the return.

With the code in this form, the interpreter continuations and the C# continuations are one and the same object, and we can take advantage of the processor's stack hardware for performance. This solves the first problem we have with C# continuations. The second problem is harder to solve.

1 comment:

David Bakin said...

Oh I am now itching to experiment and see if the async/await features in C# (introduced in C#5.0 long after you were working on this interpreter and wrote this post) can be misused to implement tail calls in exactly the fashion you're talking about here.

Internally, you know, it translates your async/await code into a state machine, allocating the fields necessary for the "continuation" in a (hidden) class, and explicitly passing the continuation around ... or at least something close to that ...