Friday, December 11, 2009

Simple Sabotage

A co-worker forwarded to me a copy of the OSS Simple Sabotage Field Manual. I have reproduced the section on organizational sabotage. It is hilarious, but frightening how many companies that I have worked for seem to follow such self-destructive advice on purpose!

Sunday, December 6, 2009

It seems the problem with apropos was a simple missing argument. I haven't hooked up the error handler yet, so this caused some really weird behavior. It's getting time to think about that. The MIT Scheme debugger uses some special operations to parse the runtime stack, so I have to make a reasonably faithful replication of this.

Friday, December 4, 2009

Innumeracy

I happened upon an article discussing glacial retreat and ice melt. At the end of the article there were several ‘glacial facts’. Here are some:
  • Average yearly retreat of the Himalayan glaciers: In 2006, 30 metres;
  • Rate at which Gangotri is melting per year: 28.1 m
  • Gangotri Length: approx 30 km;
  • Year in which Gangotri will disappear: 2050, if glacier melt continues at the same rate.(emphasis mine)
There is an obvious problem here. 30 meters per year times 50 years (1500 meters) is nowhere near the entire length of the glacier (30000 meters). It's a factor of 20 too small. This isn't a rant about climate change, it's a rant that the journalist and the editor didn't notice the problem with the math. I wouldn't expect that the journalist or the editor be wizards at calculus, but this is just a simple estimate. You don't even need an exact answer (I rounded up the melting rate and the time span to make it easy to multiply).

Thursday, December 3, 2009

More bugs

The bug with pretty printing turned out to be trivial. In a primitive where I was taking the CAR of an object, I returned the original object rather than the CAR. Stupid.

The current bug is more interesting. I got apropos working, but the next day Taylor changed how symbols are interned. apropos worked for one day only.

The bug is weird, though. When I call apropos, I get this:
1 ]=> (apropos "mic")

#[package 1 (user)]
#[package 2 ()]
;Cold load finished
;Package: (user)

2 ]=>
There are two odd things here. First is the repeat of the message that the cold load finished, second is the fact that the prompt is now at level 2.

I can't for the life of me imagine what might cause this. I'm guessing that I really screwed up the definition of a primitive and it is causing a recursive evaluation of something. I'm grasping at straws, though. In any case, this will be fun to debug because it involves weak pointers. The CLR has weak pointers, but now I have to really make them mimic the weak pointers that MIT Scheme expects.

Thursday, November 26, 2009

Hunting bugs

My version of MIT Scheme appears to be able to self-host. I can bootstrap it from the original Scheme, load the syntaxer, re-syntax everything and then boot from the new files.

I did find a bug in how internal definitions are handled when pretty printing, so I'm on the hunt now.

Tuesday, November 24, 2009

Dry spell

I've been in a bit of a dry spell as far as blogging is going. I think I'll have some more news soon, though.

The comment spammers are getting more clever. I found one today that almost seemed relevent (except for the link to cheap drugs). I hope I don't have to turn on comment moderation.

Wednesday, October 28, 2009

update

I've tweaked and optimized things in my interpreter so that the median time for sboyer.scm is now at 2.1 seconds (down from the baseline of 6.56 seconds). It wouldn't be too hard to push it below 2 seconds with some more specialization of the conditionals, but that's not where I want to go.

Wednesday, October 21, 2009

Now that flat environments are working fairly good, the bottleneck has shifted. The top three items on the ‘top of stack’ histogram are the primitive procedures CAR, PAIR?, and NULL?. Let's look at the code for PrimitiveCombination1:
 
public override bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    // Evaluate the argument.
    Control unev0 = this.arg0;
    Environment env = environment;
    object ev0;
    while (unev0.EvalStep (out ev0, ref unev0, ref env)) { };
    if (ev0 == Interpreter.UnwindStack) {
        ((UnwinderState) env).AddFrame (new PrimitiveCombination1Frame0 (this, environment));
        answer = Interpreter.UnwindStack;
        environment = env;
        return false;
    }

    // Call the primitive.
    if (this.method (out answer, ev0)) {
        TailCallInterpreter tci = answer as TailCallInterpreter;
        if (tci != null) {
            answer = null; // dispose of the evidence
            // set up the interpreter for a tail call
            expression = tci.Expression;
            environment = tci.Environment;
            return true;
        }
        else
            throw new NotImplementedException ();
    }
    else return false;
}
The method for CAR is this:
public static bool PrimitiveCar (out object answer, object arg0)
{
    answer = ((Cons) arg0).Car;
    return false;
}
There's quite a bit of noise in that code, so here is the main code path:
bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    // Evaluate the argument.
    Control unev0 = this.arg0;
    Environment env = environment;
    object ev0;
    while (unev0.EvalStep (out ev0, ref unev0, ref env)) { };
    if (ev0 == Interpreter.UnwindStack) { ... }

    // Call the primitive.
    if (this.method (out answer, ev0)) { ... }
    else return false;
}
The while statement is the tail-recursion trampoline. The immediately following conditional is there to support first-class continuations. The method is expected to stuff its result in answer and return false, unless it needs to make a tail-recursive call, in which case it returns true.

This is pretty general, and it has to be if we are going to support primitives like call-with-current-continuation, but the number one primitive procedure is CAR, and we can handle that one a bit more efficiently. The first thing we need to do is inline the call to CAR:
bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    // Evaluate the argument.
    Control unev0 = this.arg0;
    Environment env = environment;
    object ev0;
    while (unev0.EvalStep (out ev0, ref unev0, ref env)) { };
    if (ev0 == Interpreter.UnwindStack) { ... }

    // Attempt to cast.
    Cons theCell = ev0 as Cons;
    if (theCell == null) {
        ... enter error handler ...
        }
    else {
        answer = theCell.Car;
        return false;
    } 
}
This avoids several operations. We no longer push ev0 as an argument just to pop it off in the primitive, and we no longer return a flag for a conditional branch. This is about half of the work.

The other half is in evaluating the argument to CAR. The debug version shows that the argument to CAR is usually bound in an argument position in the enclosing lambda. If that is the case, then there is no need for the tail-recursion trampoline or the continuation handling code. We can just fetch the argument.
bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    // Attempt to cast.
    Cons theCell = environment.ArgumentValue(this.argumentOffset) as Cons;
    if (theCell == null) {
        ... enter error handler ...
        }
    else {
        answer = theCell.Car;
        return false;
    } 
}
If the primitive cannot throw an error (for example, PAIR?), it is even simpler:
public override bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    answer = environment.ArgumentValue (this.offset) is Cons;
    return false;
}
This code takes almost a negligable amount of time.

Unfortunately, specializing on the primitive procedure and the argument type like this requires a lot of code. Each one-argument primitive can be specialized in at least six different ways, and each way is its own separate class. C# does not have macros and templates don't quite work for this sort of thing. The alternative is some code-generation mechanism, but I've been too lazy to automate that (I've been expanding these things by hand). On the other hand, the unspecialized mechanism is not unreasonable if it isn't called to often, so by specializing only a handful of the top primitives we get a lot of performance improvement for very little work.

MIT-Scheme has reflective operations for manipulating its own SCode. In order to maintain compatiblity with these, the specialized primitives inherit from PrimitiveCombination1. The code for EvalStep is overridden, but we retain the rest of the class. This allows the Scheme level to reflect on this code as if it were unoptimized code. A good example of this is the pretty printer. When it encounters an optimized PrimitiveCarA (primitive CAR of an argument), it treats it just like a PrimitiveCombination1 with an operator of CAR and an argument of some Variable.

Just by optimizing CAR, CDR, PAIR?, and NULL?, the median time for sboyer drops to 2.62 seconds.

Tuesday, October 20, 2009

Allocating ValueCell objects at every function application takes a bit of time. It isn't always necessary, either. The point of creating a value cell is so that side-effects on variables have the appropriate sharing semantics. Most variables are not side effected.

It is easy to find the side effected variables by tree-walking the body of a lambda expression. If none of the lambda-bound variables are assigned to, then we can create more efficient environment structures at apply time. StaticEnvironments have this structure:
class StaticEnvironment : LexicalEnvironment
{
    readonly ValueCell [] bindings;

    internal StaticEnvironment (Closure closure, object [] initialValues)
        : base (closure)
    {
        object [] formals = closure.Lambda.Formals;
        this.bindings = new ValueCell [initialValues.Length];
        for (int i = 0; i < initialValues.Length; i++)
            this.bindings [i] = new ValueCell (formals [i], initialValues [i]);
    }

    ...
}
We define SimpleEnvironments like this:
class SimpleEnvironment : LexicalEnvironment
{
    readonly object [] bindings;

    internal SimpleEnvironment (Closure closure, object [] initialValues)
        : base (closure)
    {
        this.bindings = initialValues;
    }
    ...
}
Earlier, I posted a table of frame sizes after a long run:
[0]      99656436
[1]     817178031
[2]     219585322
[3]      45556970       
[4]       6140170       
[5]       2857104       
[6]        702372       
[7]        448574       
[8]          3080       
[9]             1       
[10]          568       
[11]          156       
[12]            3       
[13]            2       
[14]          177       
[15]            6 
More than 99% of the environment frames have three or fewer variables. Instead of holding the variable values in a vector, it is worthwhile to simply enumerate them as fields in the environment object itself. Here is SmallEnvironment1:
class SmallEnvironment1 : LexicalEnvironment
{
    readonly object binding0;

    internal SmallEnvironment1 (Closure, object binding0Value)
        : base (closure)
    {
        this.binding0 = binding0Value;
    }
There are similar classes for SmallEnvironment0, SmallEnvironment2, and SmallEnvironment3.

Although these environments are quite specialized, they account for the vast majority of environments that are dynamically created. This leads to a good performance increase. sboyer now takes a median time of 3.504 seconds. It now turns out that variable lookup is not the dominating factor in the performance. I'll discuss the next problem in the next post.

Monday, October 19, 2009

Oh yeah, those flat environments

I did finally get flat environments working the way I want, and then refactored to be simpler and clearer. The basic idea is that rather than chasing down environment frames looking for a binding, we keep the lexical variables in a vector. A closure now looks like this:
    class Closure
    {
        protected readonly Lambda closureLambda;
        protected readonly Environment closureEnvironment;
        protected readonly ValueCell [] staticBindings;
        ...
    }
When we need the value of a lexical variable, we find it at a precomputed offset in the staticBindings. (They are ‘static’ because the location of the binding cell doesn't move.) When we create a closure, we need to copy some of the static bindings from the parent environment. For this we need a StaticMapping.
class StaticMapping
{
    int [] offsets;
    ....
}
The StaticMapping is stored in the StaticLambda from which we construct the StaticClosure. We copy the bindings when we construct the StaticClosure.
bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    answer = new StaticClosure (this, environment.BaseEnvironment, environment.GetValueCells (this.staticMapping));
    return false;
}
And the code for GetValueCells is this:
internal override ValueCell [] GetValueCells (StaticMapping mapping)
{
    int count = mapping.Size;
    ValueCell [] cells = new ValueCell [count];
    for (int index = 0; index < count; index++) {
        int o = mapping.GetOffset(index);
        if (o < 0)
            cells [index] = this.bindings [(-o) - 1];
        else
            cells [index] = this.Closure.StaticCell (o);
    }
    return cells;
}
The StaticMapping encodes argument bindings as negative numbers and static bindings as positive numbers. The appropriate cells are copied from the parent environment.

Going to flat environments makes a substantial improvement. Our baseline median time for the sboyer benchmark was 6.596 seconds. With flat environments, the median time is now 4.346 seconds.

Variable lookup is no longer the bottleneck in the interpreter. Procedure application is. It was worth the tradeoff, but let's see what procedure application involves:
bool Apply (out object answer, ref Control expression, ref Environment environment, object [] args)
{
    if (args.Length != this.arity)
        throw new NotImplementedException ();
    expression = this.closureLambda.Body;
    environment = new StaticEnvironment (this, args);
    answer = null; // keep the compiler happy
    return true;
}

internal StaticEnvironment (Closure closure, object [] initialValues)
    : base (closure)
{
    object [] formals = closure.Lambda.Formals;
    this.bindings = new ValueCell [initialValues.Length];
    for (int i = 0; i < initialValues.Length; i++)
        this.bindings [i] = new ValueCell (formals [i], initialValues [i]);
}
The big problem is that we are allocating ValueCells for the argument bindings. We'll deal with this next.

Thursday, October 15, 2009

Short solution

No takers? Oh well. Here's the solution for yesterday's short exercise.

The amount of remote data is small. We only have 10K records and only add a handful a day. This will all fit in memory just fine.

Now imagine that the cache is a bucket with a small hole in it. Over time, as the cache entries become stale, the cache slowly empties. We can calculate a long-term rate at which entries expire. (This isn't actually what happens, though. The entries expire en masse, but let's pretend.) If we continue to fill the bucket at the same rate as the bucket empties, it will always be full. Any slower and the bucket will empty. Any faster and it will overflow.

The remote database can deliver one entry in 150ms, but we don't want to saturate that connection (there are other clients and we presumably want to perform work other than cache refresh). So let's dedicate 2% of the client bandwidth to the cache. If we fetch no more than one entry every 50 * 150ms = 7.5 seconds, we'll remain under 2%. Of course this means that we cannot let the records expire at a rate faster than this. If our cache has 10K records and they expire at a rate of one record every 7.5 seconds, the cache will be empty in 75K seconds, or 20.8 hours. We set the expiration time on an entry at a tad more than that and we're all set. If 20.8 hours is unacceptably stale, we can shorten it by reserving more bandwidth for the cache. There is a limit, though. With a handful of clients each consuming 2%, there would be a small constant load on the server. If we increased each client to consume 10-12%, the server will be spending most of its time servicing client caches.

Wednesday, October 14, 2009

Short exercise

I seem to have solved the bug in my flat environment code. I'll post on that in a day or two.

An interesting problem came up at work a couple of days ago, and it makes for a good engineering problem. Let me see if I can describe this in an interesting way.

A database application generates charts that describe certain things about the data. For instance, you might generate a chart of a moving average of field X for all entries where the username is ‘jmarshall’. This is pretty standard stuff for a simple database. There's a problem, however. For various reasons, some of the information we like to plot is stored elsewhere and we have to fetch it. This is a heavyweight request that takes 150ms. We have on the order of 10K records, and we typically examine several hundred to create a chart. If we need the remote data, we're looking at several hundred of these heavyweight requests. A big chart can take two and a half minutes to draw and the user gets very impatient.

A cache was added in order to make things remotely bearable. The database changes relatively slowly (a handful of records a day), and the charts are not designed for pinpoint accuracy, so there is no requirement that the cache be completely fresh or for it to provide a transaction consistent view of the data. A very simple cache that remembers the data for a few hours is acceptable. This alleviated a lot of the pain because now users could generate different variations of their charts and compare them in real time.

But there is still the problem with the ‘startup transient’. If no one has recently generated a chart, that first one you make will take forever as the cache is loaded. So the problem is getting rid of the startup transient.

I'll give the following hints:
  • Assume memory is not an issue.
  • Assume multiple writers to the database and the remote data. You will not get ‘update’ notifications.
  • Several instances of the application may be running at once, but they cannot communicate with each other. (So an application must not saturate the database with requests.)
  • Fetching part of a record or the additional info is no cheaper than fetching the whole thing, so a timestamp or checksum comparison will not be faster.
  • The data does not have to be transactionally consistent.
  • The data can be somewhat stale, but there should be a limit.
The problem is to come up with a cache refresh policy that avoids the startup transient and estimate how stale the data could become under that policy.

Monday, October 12, 2009

grr... bugs

So as I'm trying to write out the details of how the flat environments work, I had an idea on how to optimize top-level references. While I was working on that, I discovered problems in the mapping tables that are carried around in the partial environments. Now I'm debugging those.

Kbob asks:
In your profiling, have you measured the distribution of frame sizes?
Indeed I have. Here is a histogram of frame sizes after a 10 billion evaluation run:
Frame      
Size       Count

[0]      99656436
[1]     817178031
[2]     219585322
[3]      45556970       
[4]       6140170       
[5]       2857104       
[6]        702372       
[7]        448574       
[8]          3080       
[9]             1       
[10]          568       
[11]          156       
[12]            3       
[13]            2       
[14]          177       
[15]            6       

Sunday, October 11, 2009

To make a short story a bit longer

Optimizing argument lookup helps quite a bit, but the non-argument variables take a lot of time. We waste time by walking the environment chain and scanning the lambda arguments of each frame. There are a couple of ways of speeding this up. The first thing is to notice that if there are no incrementals in any of the frames in the lookup path, then the location of a lexical variable is fixed relative to the current environment. You can use a lexical address of the form n-frames-back by offset in frame. This gives you a lot of bang for the buck, and is what the MIT-Scheme interpreter used to do. (Now that they have a compiler, they just punt and deep search each time.) jrmscheme did the same thing, but despite the optimizations, it simply takes time to walk up the environment chain. Not only is time a problem, the storage requirements are an issue. There are bindings that are no longer in use in the deeper frames, and these are being retained along with the bindings of interest. I decided to change to a ‘flat’ environment implementation.

A flat environment does not have a chain of frames. It has a single frame with two components. The binding vector that is common to all environment structures, and a vector of pointers to the lexically visible binding cells. The lexically visible cells are indirect because the bindings may be shared among several closures. For example, look at this code:
(let ((counter 0) 
      (increment 1))
  ...
  (lambda () (set! counter (+ counter increment)))
  (lambda () counter)
  ...)
We have two lambda expressions that must be closed over the same variable counter. In a chained environment model, invoking either closure would extend the common shared environment where the counter and increment variables were bound. In a flat environment model, invoking the second closure would lead to an environment structure like this:
bindings: #()  ;; empty argument bindings
lexical: #(<pointer to counter>)
while invoking the first closure would lead to this:
bindings: #() ;; empty argument bindings
lexical: #(<pointer to counter> <pointer to increment>)
This is fairly straightforward in principle, but the devil is in the details. It took me a fair amount of time to fiddle around with the code and come up with something that worked.

The first step was to add a new phase to the interpreter. A partial evaluation walk is done on the code before calling the actual eval. In the partial evaluation phase, the code tree is rebuilt and certain nodes in the tree are replaced. A partial environment is kept as the code is walked. When a variable is encountered, it is looked up in the partial environment. If it is found in the immediately enclosing frame, it is turned into an Argument variable with the appropriate offset. If it is not found at all, it is turned into a FreeVariable. If it is found in an intermediate frame, things get tricky. We have to consider the lambda expressions that are associated with the intervening frames and decide how the variable will be accessed.

When we partially evaluate a lambda expression, we first have to check whether we'll need a first-class environment for it. Fortunately, this is an easy test: we just have to walk the body of the expression and look for a call to the the-environment special form. If we need a first-class environment, then we simply punt on any optimization other than Argument variables because we'll just do a deep search every time we access a lexical variable. On the other hand, if we don't need a first-class environment, we construct a PartialStaticClosure for the lambda expression. When we partially evaluate the PartialStaticClosure, we construct a PartialStaticEnvironment that we use for partially evaluating the body of the lambda expression.

The entire process of partial evaluation is very similar to that of ‘normal’ evaluation, but there are a couple of differences. Partial evaluation produces a PartialResult that contains the rebuilt code tree. Conditionals are partially evaluated along both branches and the PartialResults are combined into a new conditional node. Partial closures are partially evaluated right away (instead of being returned) as if they had been applied to arguments, but of course the resulting PartialEnvironment doesn't contain runtime values, it only contains binding information.

A PartialClosure leads to the construction of a PartialEnvironment. Variable lookup in a PartialEnvironment either returns the argument offset, or an indication that a deep search is necessary. The rebuilt lambda expression becomes a StandardLambda, when it is actually evaluated, a StandardClosure is built. When the StandardClosure is applied to arguments, the bindings are placed in a StandardEnvironment. Variable lookup in a StandardEnvironment uses deep search, and in this way retain the old chained environment model for first-class environments when necessary.

But if we don't need a first-class environment, we do things a bit differently. The lambda expression becomes a StaticLambda (the word static in this case meaning that the location of the bindings never change). When a StaticLambda is partially evaluated, a PartialStaticClosure is created. This leads to the construction of a PartialStaticEnvironment where variable lookup happens differently. At regular eval time, a StaticLambda creates a StaticClosure, and when a StaticClosure is applied, it creates a StaticEnvironment.

Alas, I've arrived at work. More details later...

Friday, October 9, 2009

To make a long story short

I was mulling over a bunch of ideas about benchmarking, experimentation, and the philosophy of science. You'll be happy to hear that I'll spare you the details. The end result is this: The sboyer benchmark when given a argument of 1 (591777 rewrites) runs with a median time of 6.596 seconds on jrmscheme. No optimizations are turned on, but the code has been run through SF with (declare (usual-integrations)). This is pretty lackluster, but there is no place to go but up!

So let's start with the optimizations. I have a lot of code that instruments the interpreter when I run in debug mode. According to the instrumentation, from the initial REPL prompt, through loading and running the benchmark, printing the results and halting at the next prompt, there are 102,320,483 evaluations. The breakdown is this:
Variable                39028733
PrimitiveCombination1   20796045
Conditional             13248184
PrimitiveCombination2    6294095
Quotation                6148442
Combination2             4155507
Combination1             3576004
Sequence2                2572345   
StandardLambda           2313932
Assignment               1666782    
Combination              1420518   
Disjunction              1087864   
PrimitiveCombination3       6873
Sequence3                   4451
Access                       388    
PrimitiveCombination0        295
Definition                    22
Comment                        2    
StandardExtendedLambda         1
Variable lookup is the number one item, and optimizing it will help tremendously.

If we keep track of the lexical depth of the variables (that is, how many environment frames we had to search), we find this distribution: Chart of lexical depth
The vast majority of the variables are found in the topmost frame.

Recall the code for variable lookup:
object DeepSearch (Symbol name)
{
    int offset = this.envClosure.FormalOffset (name);
    if (offset == -1) {
       if (this.incrementalDefinitions.ContainsKey(name))
           return this.incrementalDefinitions[name];
       else
           return this.envClosure.Environment.DeepSearch (name);
    }
    return this.bindings [offset];
}
Most of the time, the offset is a non-negative index into the binding array. But the offset is determined by the position of the name in the lambda expression that binds the variable, and that can be determined before we evaluate the code. So we'll introduce a new SCode type Argument that is the same as Variable except that it also contains the offset. When we construct a lambda expression, we'll walk the scode body and replace references to the lambda parameters with Argument structures. We only replace the references if they occur at the same binding level. If there is an intermediate lambda expression, we cannot replace the references. We can then add an ArgumentValue method to our environment objects:
object ArgumentValue (int offset)
{
    return this.bindings[offset];
}
The EvalStep of a Variable looks like this:
public override bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    answer = environment.DeepSearch (this.varname);
    return false;
}
The EvalStep for an Argument will be like this:
public override bool EvalStep (out object answer, ref Control expression, ref Environment environment)
{
    answer = environment.ArgumentValue(this.offset);
    return false;
}
There is one more change. If the variable is not an Argument then we know one thing for sure — we won't find it in the bindings of the topmost environment. It may be added later as an incremental, but it certainly won't be in the formal parameters and it is pointless to search there. We'll save time by avoiding that search:
object NonArgumentValue (Symbol name)
{
       // Skip the formals of this frame.
       if (this.incrementalDefinitions.ContainsKey(name))
           return this.incrementalDefinitions[name];
       else
           return this.envClosure.Environment.DeepSearch (name);
}

object DeepSearch (Symbol name)
{
    int offset = this.envClosure.FormalOffset (name);
    if (offset == -1) {
       if (this.incrementalDefinitions.ContainsKey(name))
           return this.incrementalDefinitions[name];
       else
           return this.envClosure.Environment.DeepSearch (name);
    }
    return this.bindings [offset];
}
Basically we unrolled the loop by one and tossed out the dead code.

These two tiny changes have a big effect on the running time. Our median time is now 5.488 seconds, or 0.832 times the original time.

We're by no means done with this sort of optimization, but I'll get to that soon...

Establishing a baseline

It turned out to be harder to establish a baseline than I expected. I decided to use the sboyer benchmark because it a ‘classic’ and it does a bunch of lispy stuff. If I were writing a paper or something I'd use a whole bunch of benchmarks, but I'm just doing this casually.

One problem is to find a benchmark that runs long enough to be interesting, but not so long as to be tedious. When I'm on the shuttle bus running my laptop in power-saver mode with jrmscheme running under the debugger, sboyer can take almost twelve minutes to complete. But when I'm at home running the same laptop in ‘performance’ mode with an optimized compile of jrmscheme outside of the debugger, it takes about six seconds. Benchmarking under the debugger is pointless, of course, but when I'm running in debug mode I have a lot of extra code instrumenting the interpreter, and this is useful to see exactly which optimizations are doing what at the low level.

I know that computers these days exhibit variability in benchmark timings, but the amount of variation was higher than I hoped for. In two back-to-back runs I got timings of 18.81 seconds in the first, 14.03 in the second (no, it wasn't a `warm cache'. The third run was 17.1 seconds). This means that I have to perform a fair number of runs to characterize the performance.

This got me thinking, so I'm going to cut this post short. I'll have more to say later...

Thursday, October 8, 2009

Obvious idea

This is obvious, but interesting nonetheless.

The semantics of a programming language are a mathematical model of the language. The semantics relate expressions in the programming language to mathematical objects that (presumably) are well-understood. Denotational semantics attempt to find a mapping between programs and mathematical objects (the recursive partial functions). Operational semantics attempt to describe mathematically each step of the computation as it evolves.

But we can look at this another way: the denotational semantics translate the program to a mathematical structure, the operational semantics mimic the program via mathematical structures. In other words, we can consider mathematics to be a target language and say this:
  • Denotational semantics ‘compiles’ a program into math.
  • Operational semantics ‘interprets’ a program in math.
What do I mean by math? Vaguely, I mean a powerful enough formal system. Set theory is a good choice. There are others. In fact, other universal languages should certainly be formalizable mathematically and therefore be a powerful enough formal system.

This means we can turn the above statements around:
  • A compiler is a denotational semantics from the source language to the target language.
  • An interpreter is an operational semantics from the interpreted language to the implementation language.


Some proposed additions to Scheme or Lisp (fexprs or ubiquitous first-class environments for example) pretty much make it impossible to compile code. This is because the user can come along after the fact and change the meaning of the code in a way that the compiler could not possibly forsee (by, for example, a fexpr rewriting the source code or someone injecting a new binding in an environment). If you cannot compile the code, you cannot develop a non-trivial denotational semantics for it, either. (The semantics would trivially be the limit as n goes to infinity of S(n) where S(n) is n steps of the operational semantics, provided the limit exists. In other words, the only way to determine the meaning of the program is to run it and see what happens.)

Incremental definitions

Kbob asked me: Incremental definitions. Just to be clear, you mean symbols defined inside a lambda?

(lambda (a b)
  (define c ...)
...)
c is an incremental definition?


That's a really good question. In Scheme, internal definitions are transformed into a letrec form, so the example above would be turned into:
(lambda (a b)
  (letrec ((c <compute value for c>))
    <body>))

=>

(lambda (a b)
  (let ((c <unbound>))
    (let ((temp <compute value for c>))
      (set! c temp))
    <body>))

=>

(lambda (a b)
  ((lambda (c)
     ((lambda (temp) (set! c temp)) <compute value for c>)
     <body>)
    <unbound>))
So internal defines will end up as lambda variables.

We only create environments when we evaluate lambda expressions, and all the necessary variables should be in the parameter list. The only way to add a binding to an environment that wasn't in the parameter list when the environment was created is to evaluate code that wasn't there when the environment was created. There is really only one way to do this, and that is to use eval. Although eval itself is not commonly used, the read-eval-print loop and load are. Both of these need to use eval (or an equivalent).

There are . different strategies for dealing with incrementals:
  1. Disallow eval - Use a “closed-world” model in which code cannot be evaluated and loaded at runtime. There can be no incrementals in this model. A REPL would have to be implemented as a meta-circular evaluator.
  2. Restrict eval - Do not permit define expressions to be evaluated. A REPL and load would be a problem, but this could be an option in a limited debugger.
  3. Restrict access to environments - There are certain distinguished standard environments that can be used with eval. These can be specially constructed to support incremental definitions. If there is no mechanism for gaining access to the environments created by applying a closure, then <normal> environments would not need incrementals.
  4. Support the-environment - Early versions of Scheme had the special form the-environment that would return the current environment to the user as a first-class object. The returned environment (and all the intermediate environments up to the global environment) would have to support incremental definitions, but otherwise they would not be necessary. Fortunately, it is simple to examine the code at eval time to see if there is a call to the-environment within it. If there is not, then there is no need for incrementals.
  5. Go wild - Have a primitive procedure that can extract the environment from an arbitrary closure object and allow this environment to be passed to eval. All environments must support incremental definitions because there is no way to predict if they would be necessary.

Back in the day, MIT Scheme chose option 5. The primitive procedure closure-environment would extract the environment object from a closure, and you could call eval with that object. The special form the-environment was also supported. Unfortunately, this means that all environments must be constructed in such a way that they can be manipulated by the interpreter. Furthermore, it means that all variable lookup must be done by deep searching the environment chain.

By the time the MIT Scheme compiler was written, however, it was realized that arbitrary evaluation in any environment had more disadvantages than advantages, so the MIT Scheme compiler uses option 4. If you write code that uses the-environment, the compiler will invoke the interpreter on that code. (This is so the compiler doesn't have to know anything about the interpreter implementation except the entry point. You don't want to have to maintain two separate compatible copies of the environment code.) If you don't use the-environment, the compiler is free to do what it wants. The closures created by the compiler cannot be destructured with closure-environment, but the compiler does emit debugging information to allow you to inspect what is left of the environment (if anything) once the compiler has optimized the code. The MIT-Scheme debugger uses option 2 to somewhat simulate the effect of evaluating code within the debugger.

One of the fun things about Lisp and Scheme is exploring the basement. MIT-Scheme has ‘subprimitives’ that directly manipulate the underlying memory. If you don't know what you're doing, you can easily corrupt memory and crash the system, but the system uses these to bootstrap itself. In the cold load sequence for MIT Scheme there is this interesting function:
(define (*make-environment parent names . values)
  ((ucode-primitive system-list-to-vector)
   (ucode-type environment)
   (cons ((ucode-primitive system-pair-cons)
          (ucode-type procedure)
          ((ucode-primitive system-pair-cons) (ucode-type lambda)
                                              unspecific
                                              names)
          parent)
         values)))
This creates a first-class environment structure with names and values by constructing tagged pointers to raw data. It is constructed to appear as if it were created by invoking a lambda expression with an unspecific body. This is used to construct the initial top-level environments for the REPL. In packag.scm, you'll find this:
(define null-environment
  ((ucode-primitive object-set-type)
   ((ucode-primitive object-type) #f)
   (fix:xor ((ucode-primitive object-datum) #F) 1)))
This creates a magic object that is recognized as the root of an environment chain.

My version of MIT-Scheme (call it jrm-scheme), interprets the MIT-Scheme SCode without modification, so it boots and runs with the code above. By default, I have to build environment structure that is compatible with the MIT-Scheme interpreter because the Scheme code sometimes examines the structure reflectively. But the point wasn't to make a slavish re-implementation, but to explore the implementation possibilities under real-world constraints. So the next few posts are going to discuss I implemented environments.

Wednesday, October 7, 2009

Details

Here's how I'm handling environments and variables in my version of MIT-Scheme. You can skip this post if you don't care about implementation details.

If you remember your beginning Scheme course, you know that an environment consists of a series of ‘frames’ where each frame contains a table of variable bindings and a pointer to the parent frame. When you look up a variable, you search for the binding starting at the nearest frame and work your way up to the root frame (the global environment, or some top-level environment). At least that's the model. Reality is a bit more complicated.

When you apply a closure to some arguments, you create a new environment frame that contains the bindings of the lambda-expression associated with the closure. The frame for a standard environment has a pointer to the closure that was invoked and a vector of value cells for the arguments.
class StandardEnvironmentFrame
{
    StandardClosure closure;
    ValueCell [] bindings;
}

class StandardClosure
{
    StandardLambda lambdaExpression;
    StandardEnvironmentFrame environment;
}

class StandardLambda
{
    Symbol [] parameterList;
    SCode body;
}
Notice the StandardEnvironmentFrame is missing both the table of bindings and the pointer to the parent frame. You can get the parent frame by dereferencing the closure and dereferencing its frame. We're only storing the binding cells in the environment, but you can figure out what names are associated with them by looking at the parameter list of the lambda expression (again, via the closure). Why the roundabout way? Two reasons. It is parsimonious in storage, and there is a legacy reason. Suppose you are evaluating a function call expression. If you are using a stack machine, the natural strategy is to evaluate each subexpression in turn and push the value on the stack. If you do this in the right order, you'll have the closure and the argument values in a contiguous chunk of stack. This is (almost) exactly the layout of the environment frame.

Variable lookup is simple, but tedious. First we scan the parameter list of the lambda. If the variable name is found, the offset in the lambda list will match the offset in the bindings vector, so we just go fetch it. Otherwise, we search the next frame. This is almost the correct code:
object DeepSearch (Symbol name)
{
    int offset = this.envClosure.FormalOffset (name);
    if (offset == -1) {
        return this.envClosure.Environment.DeepSearch (name);
    }
    return this.bindings [offset];
}
It's not quite correct because we didn't take incremental definitions into account. The environment of the closure could be a first-class environment. If this is the case, the user could evaluate a new definition in the environment. The StandardFrame needs a place to store these extra definitions.
class StandardEnvironmentFrame
{
    StandardClosure closure;
    ValueCell [] bindings;
    Dictionary  incrementalDefinitions;
}
And we have to modify DeepSearch to check the incrementals.
object DeepSearch (Symbol name)
{
    int offset = this.envClosure.FormalOffset (name);
    if (offset == -1) {
       if (this.incrementalDefinitions.ContainsKey(name))
           return this.incrementalDefinitions[name];
       else
           return this.envClosure.Environment.DeepSearch (name);
    }
    return this.bindings [offset];
}
As I've mentioned before, the speed of variable lookup is one of the most important factors that determines interpreter performance. The performance of the naive code above is pretty bad. In debugging mode, I sample the top of stack every few microseconds and record what the interpreter is doing. The number one item on the top of the stack is variable lookup.

At this point it is worth running a couple of benchmarks to establish baseline performance.

Saturday, September 26, 2009

If you haven't heard

There seem to be more readers of my posts than there used to be, so I wanted to mention a project I've been slowly working on.

I've been porting MIT Scheme to the .NET CLR. MIT Scheme has two major components: the ‘microcode’ which provides the memory model and the primitives, and the ‘runtime’ which is the Scheme code and libraries that come with the system. Once upon a time, the ‘microcode’ really was microcode. The Scheme-81 Chip (sorry, I can't find the text on line) was a microcoded VLSI processor that interpreted the SCode representation of Scheme. Sometime around 1982, the microcode was translated to 68000 assembly code and used to run Scheme on the HP 9836 Chipmunk. Sometime around 1983 or so, Jim Miller wrote an SCode interpreter in C that replaced the microcode layer. My version is a new SCode interpreter in C#. The source code is available at http://code.google.com/p/jrm-code-project/.

I'm doing this for fun and to try out different ideas I've had about interpreter implementation. The interpreter basically works, but it is missing a lot of primitive procedures and I haven't hooked up the error handler yet. It is, of course, a lot slower than the standard version of MIT Scheme (which is mostly compiled), but it is in the ballpark of MIT Scheme on interpreted code.

So why am I bringing this up? Lately I've been posting about first-class environments. MIT Scheme has first-class environments, and it uses them to provide different ‘packages’ for the Scheme code (a package is separate namespace in which you can run REPL). MIT Scheme also has a compiler that can produce very efficient code in the right circumstances. There is a caveat, though. If you make use of first-class environments, the compiler is forced to create environment structures that are compatible with the interpreter's representation. This is because there may be a need to call in to the interpreter to eval an arbitrary expression in such an environment. In these circumstances, the compiler can perform essentially minor optimizations that do very little to speed up the code. MIT-Scheme only uses first-class environments — actually, I'm going to call them ‘interpreter compatible environments’ because that's more accurate — to construct the runtime packages. There are on the order of 50 of them. The rest of the code doesn't use them at all.

As I have mentioned in earlier posts, the two things that an interpreter does most is variable lookup and continuation management. These things are critical to interpreter performance. I managed to get two academic papers published on how to get the underlying system to manage continuations efficiently, yet still be able to reify them to support first-class continuations. Right now I'm working on improving variable lookup.

When an interpreter compatible environment is used, there is little choice but to represent the environment as a linked list of frames. Each frame can use a vector representation for the variables that are bound by lambda application, but each frame also needs the ability to incrementally add bindings should the user evaluate a ‘define’ expression or invoke load from one of these environments. When evaluating a variable, the interpreter walks the environment chain and scans each frame until it finds the closest binding with the correct name (this is called a ‘deep search’). Needless to say this can cause serious performance degradation. A number of ways have been devised to avoid this problem. One that is of interest is the idea of computing a fixed lexical address for a variable. In most circumstances, the environment chain always has the same shape and the variable always appears at a fixed location. It is faster to dereference the chain until the correct frame is found and then pull the value of the variable from the vector. Unfortunately, it can be incorrect to do so. If an intervening frame is interpreter compatible, the user might load a file or define a variable that shadows the lexical binding. If this happens, the shadowing variable should be used rather than the former one at the fixed location. In Free variables and first-class environments, Miller (see above) and Rozas describe how MIT Scheme deals with shadowed variables. In essence, when you introduce a shadowing binding, you walk the rest of the environment chain and mark the shadowed variables. The interpreter uses the fast lookup method, but checks for the mark. If the mark is absent, the correct variable was found. If the mark is present, the interpreter does a deep search to find the variable. It sounds a lot simpler than it is. Because the lexical environment structure forms a tree, it is possible to have variables that are shadowed along one branch, but unshadowed on the other. It is possible to have variables that are shadowed at different depths on different paths. These are not often encountered (in the mid 80's, long after they thought they had covered every possible case of shadowing, Rozas discovered a new way to fool the variable lookup mechanism into fetching the wrong variable). If it did happen, the error message “Broken Compiled Variable -- get a wizard” would appear and Scheme would halt.

In my attempted port of Object Lisp, I would get this all the time. The idea was to inject the object bindings into the lexical environment when invoking an object method. This, of course, caused shadowing. But upon leaving the object method, I would remove the shadowing bindings. Upon invoking a method on a different object, a different set of bindings would be injected. The variable lookup mechanism would get confused because it would find the shadowing mark and try to locate the actual binding, but the next time it tried it would find a shadowing binding, but for a different variable. At this point I realized what a mess this was and decided that my approach wasn't going to work.

When the Scheme compiler became stable enough that the majority of code is expected to be run in compiled mode, it was considered too much of a burden to try to maintain the shadowed variable tracking mechanism and it was removed. MIT Scheme now has a slower, but far simpler interpreter. If you want fast, use the compiler.

Ok, back to my version of MIT-Scheme. For interpreter compatible environments, there is little choice but to deep search on every variable reference. But MIT Scheme only has about 50 or so of these and the vast majority of the code does not need to implement environments in this way. If we assume that incremental definition can only occur in the special interpreter compatible frames, then we can change the environment representation to make variable lookup faster. One popular mechanism is the ‘flat environment’ representation. In this representation a closure does not contain a pointer to a parent environment, but rather it contains a flat vector of pointers to the appropriate value cells for the lexical variables it uses. This makes a tradeoff. When creating a lexical closure, we have to copy a potentially large set of pointers to the lexical bindings, but when we look up a lexical variable, we only need to index into the lexical vector and dereference the pointer we find. Empirically, it seems that the tradeoff is largely worth it.

So I've changed my version of MIT Scheme to use flat environments rather than linked frames for the case where first-class environments are not used. It has been rather tricky, though, and I have a lot of scaffolding in place to double check everything. I'm now carefully removing the scaffolding and cleaning up the code to see if I have a net improvement in performance.

Friday, September 25, 2009

Needed for a debugger?

Pascal Costanza writes:
I always find the argument that some language construct is supposedly “dangerous” a bit weird. It's too fuzzy to make such a statement, in my humble opinion. What's more important, I think, is this: Do you want to be able to implement portable runtime debuggers or not? If you want this, you need first-class environments.

Argh! I did use the word “dangerous” and I was determined not to. You are correct that it is too vague a term. What I really mean in this case is this: if the language specification requires that all variable bindings be exposable and mutable to arbitrary code (that is one possible definition of `first-class environment'), then `lexical scoping' can no longer be guaranteed by the implementation, and that this drastically changes the character of the language in a particularly undesirable way, to wit, all reasoning about the code must take into account the entire body of code, not simply the lexically enclosing contexts. (Wow, that was a long sentence.)

I disagree that you need first-class environments in order to write a portable runtime debugger. That's a complex assertion, so I'll need to make a few assumptions about what you mean (and please correct me if I'm wrong).

First, it seems to me that a debugger is not usually a portable program. It will have to depend upon how the implementation works. A Scheme->C compiler would need to have some facility to debug the code generated by the C compiler. A Scheme implementation hand written in assembly code would need special assembly routines to decode the binary layout of data structures. A Scheme implementation in C# would naturally have debugging information stored as .NET metadata in the image. Then there are implementation details. Some Scheme systems alpha-rename the variables. Others discard the variable names and use De Bruijn indexes for variables. Some Scheme systems perform CPS conversion, others perform A-normal conversion. Some Scheme systems implement the interpreter as a finite state machine with a push-down stack, others use a ‘threaded’ interpreter, still others use a ‘lambda combinatorical’ approach. It seems to me that each variation will need its own means of debugging that cannot be easily shared with other variations.

Second, it seems to me that the information the debugger presents to the user is also implementation dependent. Suppose an implementation had an extension that allowed you to declare runtime constants, and that this implementation would inline the constants where they were used. Would you still expect `bindings' for these constants to be visible on the stack?

If the debugger is completely, transparently portable, and it were used on the same piece of code in two different implementations, I assume that it would present exactly the same information in exactly the same way (by my definition of ‘complete and transparent’). It seems to me that this would impose a particular evaluation model on the code. I can see two ways to achieve this:
  1. Impose a canonical evaluation model that all implementations must follow. This would specify a particular environment model that must be used and maintained by the implementation.
  2. Write a portable meta-circular evaluator that is designed to work with the portable debugger.
Option 2 solves the problem without the need to standardize on first-class environments.


Let me take a moment to describe what MIT Scheme does.

In MIT Scheme, you can run code in one of three modes. The first mode is plain interpreted code. The code is input as text then lightly processed to expand the macros and build an abstract syntax tree which is then walked by the interpreter.

The second mode is ‘syntaxed’. A program called ‘SF’ is run on the text. An abstract syntax tree is built, but then it is more heavily processed. In addition to expanding the macros, SF will ‘inline’ a number of primitive procedures and runtime constants and will simplify the code. The resulting AST is dumped to a file in binary mode. This processed code loads much more quickly and runs a fair amount faster than the original code.

The final mode is compiled code. The compiler uses SF as a preprocessor. The compiler can either produce C code that is then fed into a C compiler, or it can produce native x86 code. This code can be made to run very quickly.

Each of these modes has potentially different semantics. Before you panic too much at that, let me note that this is completely under the control of the user. The default action is to preserve the semantics of the interpreted code in all cases. When you choose the default action (or rather, fail to choose a non-default), SF will only expand the macros in the code. No inlining of any kind is performed. The compiler only open-codes those procedures inlined by SF, so no compiler optimization will be perfomed either. Therefore, simply running SF and compiling will not change the semantics of the code.

Simply running SF and compiling will not change the performance of the code, either. Given that the compiler can infer nothing about the code, all it can do is ‘inline’ the interpreter. All function calls are out-of-line and must go through an interpreter-compatible stack frame. (The compiler will transparently cache the values of some variables, but there is little performance to be gained through this.)

These semantics are very flexible. If you change the definition of CAR and CDR, the compiled code will use the new definition. However, in “real-life” circumstances, you never need this sort of flexibility. It is there, and it is the default, but for nearly any production purpose you'd like to tell the compiler that the standard definitions of the standard primitives can be assumed to be unchanging. To get a lot of bang for your buck, you simply add a declaration to the source code: (declare (usual-integrations))

When usual-integrations are declared, some fifty or so primitive procedures will be inlined by SF. These include things like CAR, CDR, VECTOR-REF, VECTOR-SET, +, etc. In addition, users can declare certain procedures of their own to be inlined. These inlining operations can have a dramatic impact on the performance of the code, but they come at a price. Because these are now inlined, redefinition of these functions will have no effect this code. Furthermore, the inlining is visible if you pretty-print the code. Inlining can introduce new lexical variables and eliminate the need for others. The lexical environment as defined in the source code may not reflect the actual set of bindings.

Most users find this a reasonably low price to pay. Few want to redefine the standard primitives (and you can specify exceptions if you really do want to redefine one or two particular ones), and while the environment may change, it isn't so different that it is unrecognizable. (There is a caveat here. We assume that the debugger will reflect the environment, but that user code is not reaching in to that reflected environment in order to perform some function necessary for the program. That is, the code isn't sneaking around using the debugger to do an ‘end-run’ around the scoping rules.)

With certain primitives inlined by SF, the compiler can now do a much better job of generating code. Instead of calling out to the procedure VECTOR-REF every time it needs to look at a vector, it can use an indexed offset load instruction. This can easily be hundreds of times faster.

Using the compiler also comes at a price. The compiler is free to put variables in machine registers, duplicate variables (provided the duplication has no semantic effect), or eliminate variables that are not necessary. The compiled code cannot be interrupted at arbitrary points. Between these points, the compiler may generate code that does not preserve the consistency of the Scheme memory model. Of course it must restore consistency before checking for interrupts. The end result is that the compiled code may be so different from the original code that it is unrecognizable. It will, however, compute the same value.

Needless to say, the debugger would have a hard time with this. There is no environment data structure. The contents of the registers is known only to the compiler. Variable names have long since gone away, and wouldn't matter because some variables are aliased and others don't even exist. So what is a user to do?

The compiler does something rather sophisticated. It keeps track of which points in the source code correspond to possible interrupt checks. Since it knows that a debugger can only gain control at an interrupt, it can annotate each interrupt poll with meta-information that tells the debugger what part of the source code is running at that point. It is an approximation, but a fairly good and useful one. In addition to this, the compiler emits a variable map that gives the debugger some information about where variables might be located. This allows the debugger to present a fairly good picture of where in the source code you are, and what the variables are bound to, if they exist at all. The debugger includes a simple meta-circular evaluator that can evaluate forms as if they were executed in the context of the debugged stack frame. Not every form can be evaluated, and assignments don't work, but enough code works to present a good illusion.


So what is the point of describing this? I think this is a nice engineering compromise between performance and debugging. However, it requires some assumptions. First, it is the case that you cannot get at the environment of an arbitrary compiled procedure. It may or may not exist, and you need the debugging meta-information to parse it, and it may have all, some, none, or extra bindings. It cannot be used programmatically, it can only be presented to the user as an aid to debugging. Second, MIT Scheme allows for first-class environments. When you call (the-environment) you will get an interpreter compatible environment with all that entails. This disables all optimizations done by SF or the compiler because any optimizations could change the expected contents. On the other hand, if you don't use the full environment, you can simply close over the variables that you do use, and the compiler will be sure to preserve the semantics of your closure while it optimizes the code. Third, it assumes that any introspection can be subordinated by the user. That is, I can write code and instruct the compiler to assume that introspection is unnecessary. With this assumption, the compiler can generate much better code. However, that also means that if you want introspection into my compiled code, you are likely to be out of luck.

In conclusion, I have these objections to your assertion that you a standardized first-class environment API to write a portable debugger:
  1. If you write a portable meta-circular evaluator to go with your debugger, you can define your own environment structures independent of the underlying implementation.
  2. A standardized first-class environment API would unduly restrict implementations to following a particular evaluation model.
  3. A first-class environment model that allowed the underlying system to add or remove bindings as it deemed necessary is by definition not portable because no binding or lack thereof could be assumed.
  4. Debugging is outside the milieu of the language specification. Each implementation can supply its own debugger or many debuggers or none at all.
  5. Not specifying a first-class environment model does not mean that a sophisticated debugger cannot be built.


I personally like the ability to use introspection when I am developing code. But I also like the ability to ‘seal’ the code against introspection in order to make the known working code run fast and to guarantee stability of that code.

Thursday, September 24, 2009

First-class environments

In the previous posts I discussed the advantages of procedural abstraction, showed various ways to poke holes in the abstraction, and discussed the consequences of doing so. In this post, I'm going to talk about first-class environments.

I really liked first-class environments when I was first exposed to them. It was cool to be able to reflect the underlying interpreter data structures to user space. At one point I attempted to port Gary Drescher's Object Lisp to Scheme. In order to handle the dynamic object scope, I used first-class environments and injected the object bindings into the lexical environment when the object was in use. (There are a lot of reasons why this doesn't work.) Over time, however, I came to realise that there was more difficulty and less power with first-class environments than I had originally thought. At this point I believe that first-class environments are useless at best, and dangerous at worst.

Before I get too far, I need to precisely describe what a first-class environment is. In Scheme, all variables are associated with a unique binding. The variable is either ‘free’, in which case it is a reference to a ‘top-level’ or ‘global’ binding, or it is ‘lexical’ in which case there is an enclosing lambda expression that is lexically superior to the reference names it as an argument. A lexical variable becomes ‘bound’ when the (closure containing the) lambda expression is applied. The binding exists as long as any code within the body of the lambda expression could refer to the variable. The ‘environment’ is the collection of bindings necessary to evaluate a given piece of code. (You should all be familiar with this.)

No doubt you are familiar with the ‘chained environment model’. In this model, a closure is made over the current environment every time a lambda expression is evaluated. When the closure is applied, the environment at the time of closing is used as a base, and a new ‘frame’ is created with the new bindings. When a variable is to be evaluated, the frames are searched from most-recent to least-recent to discover the binding. We're already in a little bit of trouble. While this model accurately describes how to determine the correct binding of a variable, it is just a model. The implementation is allowed to do whatever it wants provided that it always returns the same answer as the model would. There are several different implementations of this model. The usual implementation in a toy interpreter is to represent the environment as a simple association list. Bindings are pushed on to the list at application time and the list is searched on each reference. A more sophisticated interpreter may keep a linked list of vector-like ‘frame’ objects. It is simple to keep track of the ‘lexical address’ of a variable. This consists of the count of the number of frames back and the position in the frame where the binding is stored. When the variable is evaluated, no search is necessary. The environment chain is followed until the correct frame is found, and then the binding is at a known offset. Some implementations take advantage of the fact that most environments can be temporarily allocated on the stack in a contiguous region. These implementations can simply compute a static index back from the current stack pointer in a number of cases. Some implementations use a ‘flat’ environment. A flat environment is a vector of the addresses of the bindings needed by the lambda body. When the lambda is closed over, the bindings are copied. Finally, some implementations carefully analyze the body of the lambda expression and decide among one of several environment representations that might work best for the particular body.

The model does not specify what happens to those bindings that are not used within the body of the lambda. For example, in this code:
(let ((greeting "Hello!"))
  (display greeting)
  (for-each (lambda (e) (display e) (newline)) (list "How" "are" "you?")))
The binding for greeting could be used by the lambda expression passed to for-each, but it isn't. The model tells us that display and newline refer to the top-level definitions, and that e is immediately bound by the lambda, but it does not tell us what happens to the binding of greeting after the greeting is displayed. Some implementations retain the binding, others drop it, still others do one or the other at different times.

Returning to the question of what a ‘first-class environments’ is, there is the question of whether you should be able to extract one from an arbitrary closure. There are three potential answers:
  1. Yes, you should always be allowed to extract the environment from any closure.
  2. No, you must indicate beforehand which environments are to be first-class. (The the-environment form in early Scheme implementations and in MIT Scheme is an example.)
  3. Maybe, it depends on the implementation.
The second question is whether the returned environment contains all lexically visible bindings, or whether it contains only those bindings that are used at the point of capture, or whether you enumerate exactly which ones you want captured.
  1. All bindings, in use or not, are captured.
  2. Only the bindings in use are captured, the others may not be.
  3. Only the bindings explicitly listed by the user are captured.
  4. All, some, or none are captured depending on the implementation.
The third question is whether the returned environment contains a snapshot of the bindings (a static copy of the values at the time of capture), a live copy of the bindings (a copy that changes as the values change), a mutable live copy (modifications to the copy affect the running of the code that refers to the bindings), or a user-specified list of the above.
  1. The actual bindings (mutable and live) are returned.
  2. A read-only reference to the bindings are returned. Values may be seen to change over time, but they cannot be modified via this interface.
  3. A snapshot to the bindings are returned. Changes to the actual bindings are not seen.
  4. The user specifies which variables are live, mutable, or snapshot.
  5. Implementation dependent.
Finally, there is a question of what happens if we evaluate a define expression in the returned environment.
  1. Evaluating a define establishes a new, shadowing binding if a previous binding did not exist. It acts like an assignment if a previous binding did exist.
  2. define is not to be used. It is an error to try it. (Optionally an error is signalled, etc.)
  3. Implementation dependent.
I'm pretty sure these options cover the design space. The current state of affairs is that all options are implementation dependent. Any standardization of first-class environments will have to change at least one of these options away from implementation dependent. So let me now discuss the problems.

When someone suggests ‘first-class environments’, I assume they want options 1, 1, 1, and 1, that is, they can grab any environment at any time, all lexical bindings are present, used or not, the bindings are live and mutable, and you can insert new, shadowing bindings. Many people have told me not to make that assumption, so I'll talk about the other variations as well. In this variation, though, the user simply cannot reason about his code. There are no abstraction barriers because any piece of code can, at any time, crack open a closure and change the meaning of any variable whatsoever. Something as simple as (lambda (x) (+ x 1)) cannot be assumed to do addition if someone injects a shadowing binding for +. Obviously you cannot compile this to an add instruction if you don't assume it will still be addition at runtime.

Thomas Lord suggested “When you write your code, avoid capturing environments and then you are all set.”

He is apparently suggesting option 2 for the first question: explicit marking of environments you wish to capture. This is a considerably weaker proposal because allows the user to statically analyze any code that doesn't use a first-class environment, and it allows the implementation freedom in choosing environment representations in any code that doesn't use first-class environments. I have few objections to that, but let's examine question 2 under this proposal.

The second question is whether all bindings are visible, or only those bindings that the user explicitly specifies. The latter would take a form something like this: (the-environment foo bar <more variables here> ...). I have a small objection to this. It is poor practice to expose the internal names of your variables (see previous posts). I don't think it useful for Scheme standardization because it is trivially implemented as a macro. (The third option of some bindings being available, some not, is not worth considering. It would be impossible to write portable code that used them because there are no guarantees they exist.)

So allow me to summarize my objection to first-class environments:
  • If first-class environments can be arbitrarily extracted from any closure, you can no longer depend on lexical scoping. You throw the baby out in favor of the bathwater.
  • If first-class environments can only be obtained through use of an explicit special form, and you explicitly enumerate the variables captured, you don't need a change to the standard, you need a SRFI with a macro.
  • If first-class environments can only be obtained through use of an explicit special form, but all visible variables captured, you still don't need a change to the standard, you need a SRFI with a more complicated macro.


It isn't clear to me how hygienic macros would work with first-class environments. Recall that if a hygienic macro introduces bindings, the appropriate renaming is performed during transcription to avoid accidental capture. But if we wish to capture one of these bindings, we'll need to be able to refer to it in some way. Code outside the macro would be unhygienic if it could refer to the variable, so that's a problem. Code inside the macro would work fine (it would be renamed appropriately to keep a hold of the reference), but then you don't need a change in the standard to put code inside your macros.

Monday, September 21, 2009

Closing in

In the previous posts I showed some problems that occur when you poke holes in the abstraction you get from procedures. By default, lexical scoping is opaque and you have to write extra code to poke through the holes. But is there any advantage to procedural abstraction and lexical scoping beyond the fact that it is a reasonable default?

The big advantage is that procedural abstraction allows you to separate use from implementation. We don't need to know how a procedure accomplishes what it does, we just need to know what the result should be. On the other side of the barrier, we don't need to know how where the arguments came from, or what the result is used for, we just need to compute it. Presumably, the more efficiently the better. Now let's return to Louis Reasoner. He's just written a sorting routine:
(define (lib-sort list <)
  (cond ((pair? list)
         (let ((first (car list)))
           (do ((before '() (cons (car after) before))
                (after (lib-sort (cdr list) <) (cdr after)))
               ((or (null? after) (< first (car after)))
                (append (reverse before)
                        (cons first after))))))
        ((null? list) '())
        (else (error "foo"))))
It occurs to him that maybe that call to reverse could be a bottleneck, so he instruments it with the code from the last post:
(define lib-sort
  (let ((reverse-counter 0))
    (register-counter! reverse-counter)
    (lambda (list <)
      (cond ((pair? list)
             (let ((first (car list)))
               (do ((before '() (cons (car after) before))
                    (after (lib-sort (cdr list) <) (cdr after)))
                   ((or (null? after) (< first (car after)))
                    (set! reverse-counter (+ reverse-counter 1))
                    (append (reverse before)
                            (cons first after))))))
            ((null? list) '())
            (else (error "foo"))))))
;Value: lib-sort

(lib-sort '(3 1 4 1 5 9 2 6 5 3 5) <)
;Value 20: (1 1 2 3 3 4 5 5 5 6 9)

((cadr (assq 'reverse-counter *counters*)))
;Value: 11
But Louis is called away before he can go much further down this path. He gives the rest of his tasks to his intern.

The intern has to write a program that, given a list, sorts it and returns a pair where the car is the sorted list, and the cdr is the length of the sorted list. That's trivial:
(define (sort-and-length list <)
  (let ((s (lib-sort list <)))
    (cons s (length s))))
But it occurs to him that this is less efficient than it could be. The call to length has to traverse the entire list, and presumably the call to lib-sort must as well. In order to cut down on the number of list traversals, the intern takes a look at the code for lib-sort. It is rather baffling to him (he is an intern), but he figures out that since reverse is called on every recursive call, the number of calls to reverse has to equal the length of the list. So he codes up this monstrosity:
(define (sort-and-length list <)
  (let* ((c (assq 'reverse-counter *counters*))
         (start ((cadr c)))
         (s (lib-sort list <)))
      (cons s (- ((cadr c)) start))))
Time passes...

It turns out a customer is complaining that the code is too slow. A quick test shows that he is trying to sort a list of ten thousand elements and it is spending all its time in lib-sort.

“What idiot wrote this?!” asks Cy D. Fect. “There is an FFI to qsort.” Cy replaces the sort routine:
(define (lib-sort list predicate)
  (vector->list (ffi-qsort (list->vector list) predicate)))
Of course he removed the code that tracks calls to reverse because qsort doesn't use it. When he checks in the code, lib-sort is much, much faster, but for some reason all the menus in the GUI now only contain a single entry. Cy calls in Ben Bitdiddle for help. Ben notices that the GUI calls sort-and-length for each menu, and sort-and-length is reporting that each menu has zero entries. He fixes sort-and-length to do the obvious thing and everything is off and running again. Ben shakes his head and sighs.

One of the most important points of procedural abstraction is that it allows you to change the implementation of the procedure at will without having to analyze the entire code base. We saw before that if we allow the internal variable names to escape (by using them as keys), then we can no longer change the names. In this case, we're going further. We want to eliminate the name altogether because we're changing the entire algorithm. The variable `reverse-counter' won't even exist anymore in this code. By exposing it in this way, we made it possible for an unplanned dependency to be added.

In this example, the unplanned dependency was rather idiotic. That's not the point. I have run into this sort of bug many, many times where an abstraction is not fully documented (or not well documented) and a programmer misunderstands the API and uses some internal function for the wrong purpose. Things work fine until the implementation changes, then very weird unrelated things start to break. Sometimes the code is so tangled that you have to emulate the effect of the old implementation just to keep from having to rewrite huge swaths of the unrelated code.

Saturday, September 19, 2009

Move, Down the road I go

Louis Reasoner doesn't care for all the work he had to do just to make it possible to read and clear the counter. He decides to write a macro to help. It's pretty simple:
(define-syntax register-counter!
  (syntax-rules ()
    ((register-counter! variable)
     (add-counter! (quote variable)
                   (lambda () variable) ;; reader function
                   (lambda () (set! variable 0)) ;; clear function
                   ;; maybe more functions here some day
                   ))))

(define lib-mapcar 
  (let ((calls-to-f 0))
    (register-counter! calls-to-f)
    (lambda (f list)
      (cond ((pair? list)
             (let ((h (car list)))
               (cons (begin
                      (set! calls-to-f (+ calls-to-f 1))
                      (f h))
                     (lib-mapcar f (cdr list)))))
            ((null? list) '())
            (else (error "improper list"))))))
This works nicely.

Management is insisting that the Lisp system support ‘constant ref arguments’. They are a bit unclear as to what the advantages are, but they say that they are absolutely necessary in order to assure investors that modern software techniques are supported. In disgust, Alyssa P. Hacker writes these macros and then takes a vacation:
(define-syntax ref
  (syntax-rules ()
    ((ref var) (lambda () var))))

(define-syntax deref
  (syntax-rules ()
    ((deref var) (var))))
Louis Reasoner takes a look at the code, thinks for a long while, then comes up with a test case:
(define (fib x)
  (if (< (deref x) 2)
      (deref x)
      (let ((x1 (- (deref x) 1))
            (x2 (- (deref x) 2)))
        (+ (fib (ref x1)) (fib (ref x2))))))

(let ((arg 7)) (fib (ref arg)))
=> 13
But Louis sees a problem: suppose we have a function that takes several ref arguments. It's such a pain to write something like this:
(foo (ref x) (ref y) (ref z) (ref a) b c)
So he writes his own macro:
(define-syntax lref
  (syntax-rules ()
    ((lref var ...)
     (list 
       (cons (quote var) (lambda () var)) ...))))
For those unfamiliar with the ... notation, the idea is that it causes pattern repetition. lref will be an n-ary macro. Each var that is passed in will be turned into an entry in an alist.

Now it is much easier to write code like this:
  (foo (lref x y z a) b c) 
With all the references passed in as an alist. Foo will have to be tweaked:
(define (foo ref-args b c)
  (let-syntax ((deref (syntax-rules () 
                        ((deref var) ((cdr (assq 'var ref-args)))))))
    ;; example body for foo
    (display (+ (deref x) (deref y) c))))

(let ((x 3)
      (y 2)
      (z 1))
  (foo (lref x y z) 2 5))
=> 10
The boilerplate syntax at the beginning of foo is a pain, but Louis is sure that some kind of macro can take care of that as well.

There is a looming problem here, which I hope is fairly obvious given the last couple of posts. Spoiler below....

I want to be clear on something, though. In any sort of example like this, where there is a problem that is not immediately obvious, you have to write a fair amount of code to get to the heart of the issue. You have the option of using real-world code, but there is often so much going on in the code it is hard to see the specifics. You have the alternative of writing a toy example (like this one), but then you get the objection “no one would ever write such bad code”, so your example, while potentially a problem, is so convoluted that it would never occur in practice.

So here's the big looming problem: the callee needs to know the names of the arguments that the caller assigned. Stupid macros, passing an alist, and so-called constant references are just there for a good story. The problem is that if we wish to change the name of something in foo, we have to locate every possible caller and change the name there as well. The register-counter! has the same problem: if you change the name of the counter variable, you have to find any code that examines the counters and be sure it isn't looking for the old name.

Just use grep? What if they wrote some code over at Larry Liverless Labs that uses the old name? A big advantage of lexical scoping is that interior names are local. You can change them, add them, or remove them without examining the entire code base. The macros that Louis wrote fail because they expose the internal names to code outside the function.

Tomorrow, the advantages of hiding the code...

Friday, September 18, 2009

And nows the time, the time is now

In my previous post I showed one way of breaking procedural abstraction. Dynamic binding allows the details of the implementation to be visible to unrelated code, and this causes unpredictable consequences (if it is used pervasively).

Let's give Louis Reasoner another tricky task. He's going to port the code to Scheme and he is to remove the logging from lib-mapcar and instead instrument the code with a counter that will be used by the statistics package. Recall that the code currently looks like this:
(defun lib-mapcar (f list)
  (cond ((consp list)
         (let ((h (car list)))
           (log "Calling " f " on " h)
           (cons (funcall f h) 
                 (lib-mapcar f (cdr list)))))
        ((null list) '())
        (t (error "improper list"))))
The port to Scheme is easy:
(define lib-mapcar 
  (lambda (f list)
    (cond ((pair? list)
           (let ((h (car list)))
             (cons (f h) 
                   (lib-mapcar f (cdr list)))))
          ((null? list) '())
          (else (error "improper list")))))
And Louis adds a counter:
(define lib-mapcar 
  (let ((calls-to-f 0))
    (lambda (f list)
      (cond ((pair? list)
             (let ((h (car list)))
               (cons (begin
                       (set! calls-to-f (+ calls-to-f 1))
                       (f h))
                     (lib-mapcar f (cdr list)))))
            ((null? list) '())
            (else (error "improper list"))))))
But at this point he is stuck. He wants to be able to get at the value of the counter in order to read the count, but he can't because of the lexical scoping. There is an easy trick. We use another closure that closes over the same variable:
(define *counters* '())

(define (add-counter! name reader)
  (set! *counters* (cons (cons name reader) *counters*)))

(define lib-mapcar 
  (let ((calls-to-f 0))
    (add-counter! 'calls-to-f (lambda () calls-to-f))
    (lambda (f list)
      (cond ((pair? list)
             (let ((h (car list)))
               (cons (begin
                      (set! calls-to-f (+ calls-to-f 1))
                      (f h))
                     (lib-mapcar f (cdr list)))))
            ((null? list) '())
            (else (error "improper list"))))))
And here it is in action:
(lib-mapcar (lambda (x) (* x 2)) '(3 1 4 1 5 9 2 6 5 3 5))
=> (6 2 8 2 10 18 4 12 10 6 10)

((cdr (assq 'calls-to-f *counters*)))
=> 11
Using an alist to hold the counters and just invoking the reader procedure is a little crude (we could make some nice abstractions here), but that isn't the point. The point is that by closing over calls-to-f and exporting that closure we have poked a very tiny hole in our abstraction barrier. The hole is just big enough that some external code that is not under our control can read the value of our counter, but that is it. There is no way for the external code to modify the value. But there is one other thing we hid. The name of the variable that holds the counter is also hidden from the external code. If we want, we can change the code like this:
(define lib-mapcar 
  (let ((the-counter 0))
    (add-counter! 'calls-to-f (lambda () the-counter))
    (lambda (f list)
      (cond ((pair? list)
             (let ((h (car list)))
               (cons (begin
                       (set! the-counter (+ the-counter 1))
                       (f h))
                     (lib-mapcar f (cdr list)))))
            ((null? list) '())
            (else (error "improper list"))))))
I have renamed the variable and all the places that the variable is used. This makes no difference to any other code. And because the scope is lexical, I know that all the code that could possibly care about the variable name is right there. I don't need to sift through the entire rest of the code base or obtain a list of variables from my customers. Nor do I have to tell them I changed the name. Now this is pretty cool.

You should find it easy to imagine how we could allow the external code to reset the counter to zero in addition to reading it, but not allow it to set the counter to an arbitrary value.

Thursday, September 17, 2009

Yet more rambling

I hope I'm preaching to the choir when I say that procedural abstraction is the greatest thing since sliced bread. If you're still a skeptic, let me point you at these: Now of course there are a huge number of other worthy abstractions, but since procedural abstraction is universal, you can model them with procedures. (Yes, there are other universal models you could use. No, I'm not suggesting we discard the other abstractions and implement them with procedures.) I'm simply pointing out that procedures are a very nice, powerful, and universal abstraction.

Let's break them.

Why would I want to break abstraction? Isn't that a bad thing? Well, yes, it is. I think it is a complete disaster, and I hope you do, too. But I've noticed a few people that seem to think that maybe a little bending — maybe a tiny bit of tweaking — might be ok if you were able to get something for it at the end of the day. They are wrong.

One reason that procedures make for good abstraction barriers is that they are opaque. As a caller, you cannot see how the procedure is written or how it performs its work. You get to supply the arguments, it gets to return an answer and that's it. The barrier works the other way, too. The procedure you call cannot get its hands on the guts of the caller, either. This wasn't always the case. Some early Lisp dialects were dynamically scoped (but Lisp 1.5 had static binding!). The bindings of the caller were visible in the callee. There were several people that pointed out that this was not a good thing. Suppose you have this code:
;; In a library somewhere:

(defun lib-mapcar (f list)
  (if (null list)
      '()
      (cons (funcall f (car list))
            (lib-mapcar f (cdr list)))))
But later we get a bug report:
(lib-mapcar #'(lambda (x) (+ x 1)) (cons 1 2))
;; This crashes!
Obviously you shoudn't call lib-mapcar on an improper list, but crashing is less desirable than an informative error message, so Louis Reasoner is tasked with fixing lib-mapcar. It takes him a while, but he comes up with this:
(defun lib-mapcar (f list)
  (cond ((consp list)
         (cons (funcall f (car list)) 
               (lib-mapcar f (cdr list))))
        ((null list) '())
        (t (error "improper list"))))
In the code review, Alyssa P. Hacker suggests logging the calls to f, so Louis does this:
(defun lib-mapcar (f list)
  (cond ((consp list)
         (let ((h (car list)))
           (log "Calling " f " on " h)
           (cons (funcall f h) 
                 (lib-mapcar f (cdr list)))))
        ((null list) '())
        (t (error "improper list"))))
The code passes the regression tests and they ship it.

Two days later, a frantic call comes in from Larry Liverless Labs. Their physics package has mysteriously stopped working. They describe the bug:
(let ((h 6)  ;; approx. Planck's constant
      (c 3)) ;; approx. C
  (lib-mapcar
     #'(lambda (lam) (/ (* h c) lam))
    '(3 1 4 1 6)))
;; Expected answer:  (6 18 4 18 3)
=> (3 9 2 9 1)
The problem, obviously, is that the variable h that Louis introduced is shadowing the variable h in the physics package.

“I'll just change it, then. Can I get a list of all the variables used by all our customers so I know which ones to avoid? Oh, and we should document all the variables we declare so that future customers won't accidentally use one.”

The solution is to use lexical scoping because it hides the details of the implementation.

Now I'm not saying that dynamic scoping is bad or wrong, I'm saying that it is the wrong default. Suppose there is a dynamic variable that controls the logging level:
         ...
         (let ((h (car list)))
           (log logging-level "Calling " f " on " h)
         ...
If there are only a handful of these, we could document them all. Or we could invent a naming convention that makes it obvious that we expect it to be dynamic (add asterisks around it, for example), or we could add special forms for establishing and referring to dynamic variables (like fluid-let or parameter-value). But we really want it to be the case that when we send a junior programmer like Louis Reasoner in to look at the code, he can assume that local changes have local effect and that as long as he computes the correct value, he'll be ok.

to be continued...

Wednesday, September 16, 2009

More rambling

I'm going somewhere with this, I swear.

Primitive ontological abstraction gives us some objects we can play with, but it would be nice to be able to talk about them. Nominal abstraction allows us to assign a name to an object and to use the name to refer to the object. It's such a natural thing to do that you hardly notice it being done, but it is a tremendous source of power as well as confusion. I'm not going to explore this further right now.

So we have objects and names, and at this point we can construct a primitive programming language about on the level of assembly code. It isn't much, but it's better than toggle switches and hex codes. We need a few more abstractions. We'll need relational abstractions of some sort. I don't mean something as formal and heavyweight as a database relation, I mean some means of talking about two objects and how they relate to each other. An example would be a number and an array, and I might want to `put the number in the array'. And we'll want a way to refer to a quality that might be shared among several objects.

Now there are several ways we could go from here, but I was doing a bunch of thinking about procedural abstraction. (I was trying to figure out what sort of abstractions you needed in order to even make sense of procedural abstraction and I came up with the ones I just mentioned.) Procedural abstraction in some sense gives you the verbs you need in order to accomplish something with the objects. You don't strictly need them — very early computers didn't support subroutines, and there are a handful of computer languages that don't have the notion of a procedure. But the large majority of computer languages have a way of creating procedures (or methods, subroutines, functions, whatever you want to name them). Why is this?

One reason is that a function is a very well developed mathematical construct. Hundreds of years of thought have gone into formalizing what a function is. Another reason is that every theory of computability has ended up with modeling programs as the set of partial recursive functions. The final reason is that functions and procedures make extremely good abstraction barriers. A mathematical function is a perfect ‘black box’. It is characterized purely by the pairing up of what goes in and what comes out. Two mathematical functions are different if and only if there is an input-output pair for one that doesn't exist on the other. To a large extent, two computer programs are different if and only if there exists an input that produces different outcomes depending on the program. (There are caveats. Bubble sort and merge sort produce identical results, but whether we considered them the ‘same’ would depend on whether performance and resource use were taken into account.)