Abstract Heresies: July 2013

Monday, July 29, 2013

A bit harder

The last few exercises have been easy. These will be intermediate.

So far, we've been using the integer serial numbers to refer to audit records when we don't have the record itself. The record can be read and written in a transaction, but outside the transaction we need a non-persistent object to act as a name. The problem with integers is that they aren't typed: whatever the number 42 means in one context is unlikely to be relevant in another. The second problem is that we are deriving the integers from an underlying implementation artefact. The number 42 just identifies an object to most of the code, but it derives from an index into a vector. If we change how the number is generated, then any numbers we already know about would have to change as well.

We need an external naming scheme so we can refer to the audit records in a repository.

Exercise 5:

Have a required "name" argument when creating a repository. The name should be immutable and you should be able to access the name without a transaction.

Define an identifier object.

(defstruct (distributed-identifier
            (:conc-name did/)
            (:constructor %make-distributed-identifier (domain repository class numeric-id))
            (:copier nil)
            (:predicate distributed-identifier?))
  "DIDs are interned, so you can use EQ to test for equality.  However, you should
   never modify the slots in a DID."

  (domain ""     :read-only t :type string)
  (repository "" :read-only t :type string)
  (class nil     :read-only t :type (or null keyword))
  (numeric-id 0  :read-only t :type non-negative-integer))

You can skip the domain element for now, but the other fields are needed.

Define a printer for a distributed-identifier so they print like this: [test-repository.AUDIT-RECORD.22] the name, class, and numeric id are printed with periods between them.
Define a parser that can read the fields out of a string.
Hook up the parser to the reader so that programmers can use the square bracket notation as a literal in a program.
Intern these objects in a weak hash table. We want to be able to use EQ to test these. So typing (eq [test-repository.AUDIT-RECORD.22] [test-repository.AUDIT-RECORD.22]) will return true. If two distributed-identifiers print the same way, then they refer to the same object and they should be EQ.
Add a mapping table to the repository for the audit records. You want it so that when an audit record is newly created it will have a new identifier. You'll want to decouple the assignment of the audit-record serial number from the assignment of the numeric ID in the distributed-identifier. This is so we can import audit-records from other repositories.

Consider [test-repository.AUDIT-RECORD.22]. It is the twenty-second audit record created by the test repository, but it is not necessarily at the twenty-second offset in the repository's table of audit-records. It could be, for example, at location 165. The mapping table gives the location. If a new audit-record is created, say at location 166, a new entry in the mapping table will map [test-repository.AUDIT-RECORD.23] to location 166.

The mapping table will be persistent, and thus require a transaction to read.

Sunday, July 28, 2013

Fairly easy

The prior post had some easy exercises. This is easy, too.

Every audit record in a repository has a unique serial number (relative to the repository). We want a way to represent sets of serial numbers.

Exercise 4: Implement these:

Constructors:

bitmap->serial-number-set repository vector-1b
range->serial-number-set repository start end
Start is inclusive, end is exclusive.
list->serial-number-set repository list

Selectors and predicates:

serial-number-set? object
serial-number-set/repository serial-number-set
serial-number-set/empty? serial-number-set
serial-number-set/equal? left-serial-number-set right-serial-number-set
serial-number-set/member? serial-number-set serial-number
serial-number-set->bitmap serial-number-set

These return new sets rather than mutate the old ones:

serial-number-set/adjoin serial-number-set serial-number
serial-number-set/remove serial-number-set serial-number
serial-number-set/union left-serial-number-set right-serial-number-set
serial-number-set/exclusive-or left-serial-number-set right-serial-number-set
serial-number-set/intersection left-serial-number-set right-serial-number-set

Two more:

serial-number-set/intersaction? left-serial-number-set right-serial-number-set
Returns true if the intersection is not empty. Ought to be faster than (serial-number-set/empty? (serial-number/intersection left right))
serial-number-set/last-serial-number serial-number-set
Returns the largest serial number in the set.

We'll use serial numbers and serial-number-sets so we can refer to audit-records and sets of audit-records without needing the records themselves. We can only get to the records within a transaction, but we can refer to them by serial number outside of a transaction.

Friday, July 26, 2013

Fun with Audit Trails!

Gotta change direction here. It's getting boring. Let's implement an audit trail! How hard could it be? We'll make an audit trail out of audit records. Pick your favorite language and try this.

Exercise 1: Implement an audit record. Make sure it has these features:

a timestamp of when the record was created
a user-id to tell us who created the record
a reason why in the form of human readable text

immutable, including transitively reachable subobjects
Malformed records cannot be created. Automatically set the timestamp so it cannot be forged.

Now we have to keep them somewhere safe, and get them back on occasion.

Exercise 2a: Make a repository with some sort of stable storage for a backing store, like a file.

Exercise 2b: When creating an audit record, log it to stable storage. Make the repository be a required argument for creation of an audit record. Log the record in the repository, but don't store info about the repository itself in the record. The record doesn't have to know where it lives.

Exercise 2c: Have a way to load all the audit records back in from stable storage. Intern the audit records so re-loading is idempotent and eq-ness is preserved.

Exercise 3: Implement random access to a repository's audit log (indexed by integer)

These are all very easy. Don't worry, I'll make it harder.

Monday, July 22, 2013

Persistent vectors

Simulating mutation by creating a new object is reasonable for CLOS objects, but it is terrible idea for persistent vectors. The large amount of copying would be bad enough, but algorithms that rely on growable vectors (by calling vector-push) will change from linear to quadratic in space. That's a disaster.

We simulate mutation by replacing an entry in the object-map. This forces the granularity of mutation to be the same as the granularity of the object-map. In other words, we cannot update just a single item in a vector because there is no object-map entry referring to just that item.

So why not fix that? The obvious solution is to allocate a contiguous set of object-ids. Easy enough, but if we want to "grow" the vector we'll have a problem. The vector storage itself can easily be moved, but the range of object-ids that map to the vector cannot easily be extended.

My solution was to change how object-ids and the object-map work. The object-map maps integers to persistent objects. Conceptually, the object-map itself is a vector. If we add an orthogonal index to object-map it becomes a 2-dimensional array. Now we can place vectors in the map and assign them a single primary index and map the secondary indices on to the vector elements. Vectors can grow (or shrink) without changing their primary index.

Now that we have persistent vectors that we can mutate, we can create mutable persistent cons cells (just a 2-element vector) and mutable persistent hash tables.

Sunday, July 21, 2013

Faking mutability

Dan Lentz said

I do think the term immutability is being thrown around a bit here somewhat in denial of the reality that what we are in the very process of doing is mutating the instance. Is immutability even the goal? I mean, yes the wbtree is persistent in the sense that older versions of an object are never overwritten, but that doesn't preclude us from representing changes to an object in subsequent updates to the tree and expressing its value at a given point in time as MVCC. Any given version of the object is immutable, but the object itself can model mutability without detracting from that.

Exactly what I was going to write about next!

When we create a persistent object, we assign it a unique object id. We cannot change the object, but we can change mapping from ids to objects. The procedure remake-instance does this. remake-instance takes an persistent object and some initargs and creates a brand new object, but it updates the object-map with the old object id. We simulate slot mutation by creating a new object that differs only in that slot value and giving it the old object id.

This simplifies transaction handling quite a bit. If a transaction aborts we want to restore the world to the state it was in when we started. Since we didn't actually change the original object, all we need to do is go back to using the old object map.

Saturday, July 20, 2013

You could use a monad.

On the Racket Users list Ben Duan asked

Scenario: A piece of data is determined in the first function `f1', but is only processed in a sub-sub-sub-… function `fx'.

One way is to use pass `the-data' as arguments from `f1' through `f2' all the way down to `fx':
(define f1 (the-data …)
  …
  (f2 the-data …)
  …)

(define f2 (the-data …)
  …
  (f3 the-data …)
  …)

…

(define fx (the-data …)
  … the-data …)
But in the above way, the body of `f2', `f3', `f4' and so on doesn't use `the-data'. It is only passed to the next function. And I still have to add the argument `the-data'.

Are there other ways to solve this problem?

You could use a monad.

The basic gist of it is that instead of manipulating the-data, we manipulate functions. So the inner functions f2, f3, ... will change from this:

(define (f2 the-data arg arg …)
      …
      (f3 the-data x y …))

to this:

(define (f2 arg arg …)
  (lambda (the-data)
      …
      ((f3 x y …) the-data)))

We just move the-data to the end of the argument list and curry the functions. That makes things more complicated at first, but the inner functions that don't actually use the-data can be eta-reduced. (lambda (x) (f x)) => f

So f2 eta-reduces to:

(define (f2 arg arg …)
      …
      (f3 x y …))

and all mentions of the-data disappear in the inner functions (pretty slick!) The innermost fx can't be reduced this way, of course, and the callers of f0 have to change to pass the initial value of the-data.

I just did this with an ad-hoc code transformation. A "monad" formalizes this. (and I skipped over a lot of detail.)

Matthias Felleisen elaborated further:

Here is what Joe is saying, with fake macros; the first two parts are real:

#lang racket

;; -----------------------------------------------------------------------------
;; an example using lexical scope

(define (f1-lexical-scope x)
  (define the-data (sin x))
  (define (f2 y)
    `(f2 ,(f3 y)))
  (define (f3 z)
    `(f3 ,(f4 z)))
  (define (f4 w)
    `(f4 ,the-data ,w))
  (f2 10))

(f1-lexical-scope pi)

;; -----------------------------------------------------------------------------
;; the same example with the 'monad' spelled out

(define (f1-monad x)
  (define the-data (sin x)) ;;
  ((f2 10) the-data))

(define ((f2 y) the-data)
  `(f2 ,((f3 y) the-data)))

(define ((f3 z) the-data)
  `(f3 ,((f4 z) the-data)))

(define ((f4 w) the-data)
  `(f4 ,the-data ,w))

(f1-monad pi)

;; -----------------------------------------------------------------------------
;; a sketch of how syntax would hide the monad where needed

;; the following macros are fake, because I don't have time to write them out:
;; see the HtDP language macros for #%app, which register functions too

;; defines the-data and initializes it
(define-syntax-rule (create-store x) (define the-data x))
;; registers f as a store-passer
(define-syntax-rule (define-store-passer (f x) e) (define ((f x) the-data) e))
;; this supplements #%app so that when a registered store-passer f is applied,
;; it picks up the-data in a curried application; other functions work normally
(define-syntax-rule (apply-store-passer f x ...) (old-apply (f x ...) the-data))
;; pick up the-data from secret stash
(define-syntax-rule (access-store) 42)

;; if you had these macros, the above would read like this:

(define (f1-monad.v2 x)
  (create-store (sin x)) ;;
  (f2 10))

(define-store-passer (f2.v2 y)
  `(f2 ,(f3 y)))

(define-store-passer (f3.v2 z)
  `(f3 ,(f4 z)))

(define (f4.v2 w)
  `(f4 ,(access-store) ,w))

(f1-monad.v2 pi)

Thursday, July 18, 2013

What about...

John Cowan said:

The arguments passed to `make-instance` are the sum total of the information needed to create the instance in its initial state.

Since you require that instances be transitively immutable, the initial state is the only state.

If you have those arguments squirreled away, then you can create another instance in the exact same initial state.
∎

That's the basic argument. It's demonstrably false.

(setq foo (make-instance 'test-class))

(test-class/name foo)
ZIPPY

So the game begins. I point out a trivial technicality, you adjust for it.

The arguments passed to `make-instance` combined with the appropriate
defaults from the initargs are the sum total of the information needed
to create the instance in its initial state.

Well, that's wrong, too. As Dan Lentz pointed out, "What about any stateful behavior encoded into the objects class?"

I'm pretty sure you ultimately win. But by the time I run out of objections your argument is going to be a patchwork quilt of exceptions, special cases, "implementation artifacts", "things that might be technically wrong, but can never make a real difference", etc. etc.

This doesn't inspire much confidence in our proof.

Mr. Lentz also asked:
Couldn't it just have trapped the out of date schema version in shared-initialize and dispatched it off to update-instance-for-redefined-class?

I don't know. When we bring an object from the store, it isn't really an instance yet.

Persisting CLOS objects

In a previous post, I described how a programmer would save simple primitive objects to the persistent store. How does a programmer save a CLOS object? Very simply. Here is a class definition:

(defclass test-class ()
  ((name :initarg :name
         :initform 'zippy
         :reader test-class/name)))

Here is the persistent version:

(defclass test-class ()
  ((name :initarg :name
         :initform 'zippy
         :reader test-class/name))
  (:metaclass persistent-standard-class)
  (:schema-version 0))

And here is how you save a persistent instance to the store:

(make-instance 'test-class)

I'll do another one:

(make-instance 'test-class :name 'griffy)

Not too shabby.

The point is that we can abstract away an awful lot of the persistence layer. This is really important because the versioning layer is at least as complex. Wrapping your mind around multiple versioned instances takes practice. It's a good thing that we don't have to think worry about the persistent layer at the same time.

But I said that I'd describe how it works. I have several attempts at description sitting here on my computer, and they are hard to read, hard to undertand, and it simply doesn't seem like it would work correctly. I've tried to logically argue that it does work, and certainly the fact that the code was working is empirical evidence, but I'm still trying to find a clear description so that it simply makes sense that it ought to work. So rather than describe why it ought to work, let me describe what happens beneath the covers.

The code in pstore/pclass.lsp has the implementation. The CLOS meta-object protocol allows you to customize the behavior of the object system by adding your own methods to the internal CLOS implementation. To create a CLOS object, you call make-instance. Magic happens, but part of that magic involves initializing the slots of the newly created object. At this point during the object instantiation magic CLOS calls the generic function shared-initialize. shared-initialize is responsible for assigning values to the slots of an object and it get called on the uninitialized object, the set of slot names to fill, and an argument list. The argument list is normally the same argument list given to make-class. The default behavior of shared-initialize is to match up the keyword-specified initargs with the appropriate slots and stuff the values in. But we'll modify that.

(defmethod clos:shared-initialize ((instance persistent-standard-object) slot-names
                                   &rest initargs
                                   &key persistent-store node-id node-index
                                   &allow-other-keys)

  (if (eq instance *restoring-instance*)
      (call-next-method)
      ;; If we are being called from elsewhere,
      ;; we have to wrap the initargs and initforms
      ;; in persistent-objects and create an initializer
      ;; for this object.
      (let* ((class (class-of instance))

             (init-plist (compute-persistent-slot-initargs class
                                                           (or persistent-store  *default-persistent-store*)
                                                           initargs))

             (node-id (persistent-object/save
                       (make-initializer class
                                         (class-schema-version class)
                                         init-plist)
                       (or persistent-store  *default-persistent-store*)
                       node-id)))

        (apply #'call-next-method instance slot-names (nconc init-plist initargs))

        (setf (persistent-standard-object/node-id instance) node-id)
        (setf (persistent-standard-object/node-index instance) node-index)
        (setf (object-map-info/%cached-value
               (persistent-object/find-object-map-info
                (or persistent-store  *default-persistent-store*)  node-id))
              instance)

        instance)))

First, we check if the instance we are initializing is being restored from the persistent store. When we first open a persistent store and re-instantiate the objects, we do not want the act of re-instatiation to cause the objects to be re-persisted. So in that case we simply invoke call-next-method and let the default actions take place.

But if we are creating a new object, we want it to persist. The call to persistent-object/save does the trick, but notice that we don't pass in the instance. We call make-initializer on the argument list and we save that instead.

An initializer is a simple structure that holds the class, a "schema-version", and the argument list:

(defstruct (initializer
            (:conc-name initializer/)
            (:constructor make-initializer (class schema-version init-plist))
            (:copier nil)
            (:predicate initializer?))
  (class        nil :read-only t   :type persistent-standard-class)
  (schema-version 0 :read-only t   :type non-negative-fixnum)
  (init-plist   '() :read-only t   :type list))

and persistent-object/save serializes it like this:

(:method ((object initializer) stream symbol-table)
    (write-byte serialization-code/initializer stream)
    (write-fixnum (symbol-table/intern-symbol symbol-table (class-name (initializer/class object))) stream)
    (write-fixnum (initializer/schema-version object) stream)
    (write-fixnum (length (initializer/init-plist object)) stream)
    (iterate (((key value) (scan-plist (initializer/init-plist object))))
      (write-fixnum (symbol-table/intern-symbol symbol-table key) stream)
      (serialize value stream symbol-table)))

(I'm skipping over an important detail, but I'll get to it...)

Something unusual is going on here. The persistent object itself is not placed in the store. The argument list passed to make-instance is stored instead. Because the persistent object is immutable, all the information needed to reconstruct the object is present in the initargs, so we don't need the resulting object.

Why would we do this? The object itself has structure. Instantiating the object imposes this structure on the values stored within. The structure of the objects in the store are collectively known as the schema. Persistent stores are intended to hold objects for a long time. We expect the code that manipulates the objects to change over time, and it is likely that we will want to change the object representation on occasion. When we change the object representation, we need to consider the legacy objects that were constructed under the old representation. This is called schema evolution and it is one of the most painful tasks in maintaining an object-oriented database. At its worst, the persistent schema is so different from the code schema that you have only one way to handle the schema change: dump the entire database into a neutral format (like a file full of strings!), create a new, empty database and read it all back in. My experience with other object oriented database is that the worst case is the common case.

If we store only the information needed to reconstruct the object, we no longer need to worry about the object layout. This finesses the problem of schema evolution.

But there is a :schema-version specified in the class definition, and that is most definitely stored. There are two kinds of information in the initargs: the values themselves are obvious, but the interpretation of the values is not. An example should illustrate this.

Suppose we start out a project where we are going to save named objects in the store. At some point in the code we invoke (make-instance 'foo :name "Joe") and so there is an initializer in the store something like [foo (:name "Joe")].

Now suppose that we extend our application. We are going to store family names as well. So we start storing initializers with more data: [foo (:name "John" :family "Smith")] What do we do about the legacy [foo (:name "Joe")]? Let us suppose we decided that we'll just default the missing last name to "Unknown". Everything is cool. Old and new objects live together.

But now we want to extend our application to handle people like Cher and Madonna. We want it to be the case that we can deliberately omit the family name for some people. The initializers will look like [foo (:name "Cher")]. But now we have an ambiguity. We don't know if the family name is omitted on purpose, or whether the object was stored before the family name became important. Do we default the last name to "Unknown" or not?

The :schema-version argument in the class definition is used to disambiguate these cases. When the objects are recovered from the store, the constructor can use this value to decide how to interpret the remainder of the initargs.

Admittedly, this is a bit klunky. But it doesn't complicate things too much. Programmers will have to do two things when changing a persistent class definition: bump the :schema-version, and decide how to reconstruct objects that were stored under the legacy expectations. (Actually, you can punt on these if you can prove that no ambiguous cases will arise.)

Now about that important detail. The initializers we store aren't exactly what we said. Instead, when the persistent class is defined a set of "hidden slots" is created in parallel with the declared slots. The initargs of the hidden slots are not persistent objects, but the persistent object ids of the initargs. We don't store [foo (:name "Joe")], we store [foo (:persistent-initarg-for-name 33)] where 33 is the persistent object id of the persistent string "Joe". I could write a few pages explaining why, but it would be deadly boring. I'm sure you can imagine uses for an extra hidden level of indirection (think multi-value concurrency). (By the way, notice call to (apply #'call-next-method ...) uses nconc to paste the hidden arguments on the front of the argument list like I mentioned in the previous post.)

Does it work? Mostly. If you look at the code in conman/workspace.lsp you'll find a class with a schema-version of 1 and this method:

(defmethod pstore::restore-instance ((class (eql (find-class 'workspace))) (schema (eql 0)) 
                                     persistent-store node-id node-index init-plist)
  (debug-message 2 "Upgrading schema for workspace.")
  ;; This needs work.  The zeros are the OID of NIL.
  (pstore::restore-instance class 1 persistent-store node-id node-index
                    (list* :added-master-csets 0
                           :removed-master-csets 0
                           :transitional-added-master-csets 0
                           :transitional-removed-master-csets 0
                           init-plist)))

I added four slots to workspace objects. When resoring a workspace from the store, if it was a workspace created before these slots existed, this method overrides the usual restore method. It simply adds the new slots to the front of the init-plist before proceeding with the normal restore-instance. (The use of the number 0 instead of NIL is an implementation defect that I'm too lazy to fix at the moment.)

The problem in explaining this? I don't know an easy proof that storing initializers rather than objects is sufficient in all cases. It's not obvious that this even helps with schema evolution, and it took me a while before I was persuaded that there aren't lurking edge cases. In personal discussions, it takes a while to persuade people that this is in fact a solution to a problem. I'd love to hear a better argument.

Wednesday, July 17, 2013

alist or plist?

Another digression. We'll get to those persistent clos objects shortly, I promise.

In Lisp it is common to create little tables that map keys to values. You'd use a hash-table for any "serious" work, but sometimes you just want store a couple of items for a moment or two and you don't want to bother with the all the baggage that comes along with declaring, creating, and using hash tables.

There are two ways of doing this, both of which date back to the very earliest lisp implementations. An alist, or association list, is simply a list of associations. Each association has a key and a value. A plist, or property list, on the other hand, is a list of alternating keys and values.

Here is an alist:

((:name . "Joe")
 (:favorite-color . "Blue"))

Here is an equivalent plist:

(:name "Joe" :favorite-color "Blue")

There really isn't much of a difference between the two. Obviously you cannot just substitute one for the other because you need to use different accessors, but otherwise there seems little reason to prefer one to the other.

Alists have the nice property that they are implemented as a list of homogeneous elements. Each element in an alist is an "association". Lisp doesn't care about that, but it seems tidier.

Plists, on the other hand, are implemented with a heterogeneous list. The keys occupy the even elements, the values the odd. Again, Lisp doesn't care. But if you are misaligned in a plist, you'll end up thinking the keys are values and the values are keys and you'll pair the values with the keys to the subsequent entries.

Teachers love to make problem sets out of these things.

Alists are "deeper" because there is substructure to the entries. Plists are "flat" because the single list backbone is doing double duty linking keys to values and linking the key-value pairs together. Every key in a plist must have an associated value, but you can put a "key only" entry into an alist.

But there is one very special thing about a plist: it is isomorphic to a keyword argument list. This means you can apply a procedure to a plist and get the keyword argument mechanism to unpack it and bind the named parameters to the values. Watch:

(defun foo (&key name favorite-color)
  (format t "~%~a's favorite color is ~a" name favorite-color))

(let ((my-plist '(:name "Joe" :favorite-color "Blue")))
  (apply #'foo my-plist))

The order of the "entries" doesn't matter. This works, too:

(let ((another-plist '(:favorite-color "Blue" :name "Joe")))
  (apply #'foo another-plist))

Missing a table entry? Use argument defaulting:

(defun foo (&key (name "Unknown") favorite-color)
  (format t "~%~a's favorite color is ~a" name favorite-color))

(let ((another-plist '(:favorite-color "Blue")))
  (apply #'foo another-plist))

With judicious use of &rest, &key, and &allow-other-keys, you can pick out certain named entries, ignore the remaining ones, and pass the entire plist on to another procedure:

(defun baz (&rest args &key name &allow-other-keys)
  (print name)
  (apply #'foo args))

And because keyword handling works from left to right, with the earlier (leftmost) arguments taking precedence, you can override selected entries by simply shadowing them:

(define quux (&rest args &key name &allow-other-keys)
  (when (equalp name "Joe")
    (apply #'foo `(:favorite-color "Black" ,@args))))

Just some stupid plist tricks? No. When we couple this with generic procedures, method combination, and quasiquote we suddenly have a very powerful way to modify behaviors through delegation. We make considerable use of this in extending CLOS to handle persistent and versioned objects.

Try doing this in a more traditional language. It simply does not work anywhere nearly as smoothly.

CLOS?!

I did not like CLOS before I started working on ChangeSafe. It was "too complicated". It has weird things like "effective slots" and "method combinators" and update-instance-for-redefined-class. Most of CLOS seems to consist of circular definitions, and the documentation reflects this as well. When you are a blind man examining an elephant, no matter what you think you found, there seems to be an awful lot of it with no apparent purpose.

Code doesn't spring into life on the first iteration. It often takes many, many attempts at implementing something before you understand the problem you are trying to solve and figure out the best way to approach the solution. With iterative development, you re-approach the problem again and again in different ways trying to break it down into understandable pieces.

During the development of ChangeSafe I found myself "painted into a corner" on more than one occasion. The code had evolved to a point where it was clear that some core abstractions were the foundation of solving the problem, and the implementation of these core abstractions were at the very center of the solution. Sometimes we were wrong.

Sometimes a core, fundamental abstraction is best understood as a combination or special case of even more fundamental concepts that aren't obvious until later (or much later). This happens fairly often, so you re-implement the relevant code. Often the new abstractions are not completely compatible with the old ones, so you write some scaffolding code or an adapter API, or you bite the bullet and walk through your entire code base and fix up each and every call that used to do things the old way, and make it work the new way.

Sometimes, though, there is simply no way massage the old solution into the new one. You are faced with a tough choice: continue using the old code that doesn't quite do what you want and is becoming increasingly difficult to maintain, discard the entire body of code and start from scratch, or attempt to paper over the problem with some horrific kludge that you say will be temporary (but you know what that means).

Unfortunately, it is hard to tell if you will run into a roadblock like this until you are well down the path. Your new abstraction is making everything so much easier until you realize that in some obscure corner case it is fundamentally incompatible with the old code. Now you have a monstrous pile of code and half of it does things the new way, the other half does it the old way, and the two halves aren't on speaking terms anymore. That's bad.

Now consider that you have a persistent store full of objects that were created and curated by the old code. You cannot delete the old code if you want to access the old objects, but the new code cannot deal with the legacy objects without calling or incorporating the old code. Of course the old code will be confused by the new objects.

On one occasion I ran into a problem of this nature. A certain kind of object needed to be built using some new abstractions, but there were objects in the database that were incompatible with the new world order. I thought of different ways to deal with this. One possible solution was to create yet another abstraction that acted like a bridge between the two models. We'd migrate everything from the legacy system to the bridge system, then once there were no legacy objects, we'd replace the legacy system with the new system and then migrate everything back. "But Bullwinkle, that trick never works!" I can confirm that it does not.

But... in the particular case in point, there was a way it could work if only there were an extra level of indirection. If we could just juggle the underlying physical slot names of the legacy objects without changing the code that manipulates the logical fields, we could fake up a slot that doesn't actually exist in the legacy objects, but does exist in the new ones.

As it turns out, this is easy to do in CLOS by simply adding a couple of methods to the accessors to handle the legacy case. I was delighted to discover this clever escape hatch. It perfectly solved a problem that was starting to look like a depressing amount of very hard work. It almost seemed as if it were designed with this exact problem in mind. And of course, it was. Suddenly, I was very impressed.

Tuesday, July 16, 2013

A simple persistent store

ChangeSafe contains a simple persistent store. The code is here.

To get an idea of how this works, start with tests.lsp

First, when open-persistent-store is called, the persistent store is (obviously) opened. If the store doesn't exist, it is created. It is easier to write code with the assumption that there is always a persistent state that can be updated rather than having special code deal with a once only initialization.

All interaction with the contents of the store takes place within an atomic transaction. When the transaction ends, either all the updates to the store take place, or none do. This allows the programmer to briefly ignore consistency requirements during computation so long as the end result is consistent. The interface is fairly simple:

call-with-transaction pstore transaction-type reason receiver

pstore - a persistent store
transaction-type - one of :read-only, :read-write, or :read-cons
reason - a human readable string describing the purpose of the transaction
receiver - a procedure of one argument (the transaction object) which is invoked during the transaction

In the usual, expected case, the receiver runs, possibly makes updates to the store, and returns a value. When the receiver returns, the transaction commits (finishes) and the changes to the store are applied atomically.

In the less usual case, the transaction may be aborted. In this case, no (detectable) changes are made to the store.

The decision to commit or abort is usually made implicitly: if the receiver procedure returns normally, the transaction commits, if it performs a non-local exit, via a throw or through error recovery, the transaction aborts. However, the programmer can explicitly cause the transaction to commit or abort through a call to transaction/commit or transaction/abort.

The transaction-type indicates what kind of transaction is to be performed. Naturally :read-write is used to update the store and :read-only is used if no update is necessary. :read-cons is for the case of a transaction that only creates new persistent objects and does not change existing objects. The implementation ensures that transactions are isolated (transaction in progress are not visible to each other) and serializable (committed transactions can be placed in a consistent order even if executed in parallel). :read-only transactions only have to be consistent throughout execution, so many can (in theory) be run in parallel. :read-write transactions can be run in parallel only if they do not interfere with each other. :read-cons transactions are a special case of :read-write. Since new objects are not visible outside the transaction until the transaction commits, there can be no interference between multiple :read-cons transactions. (Support for :read-cons is sketchy, though. Although there should be no interference, each transaction must issue unique object ids and thus they do interfere implicitly. There are ways to deal with this, but I didn't implement them.)

The reason is simply a string that is recorded along with the transaction. This is to help the user understand the sequence of transactions when examining the history of the store.

The receiver is invoked on an object representing the transaction. It is often ignored because the implicit commit or abort behavior is usually used, but the argument is there if desired.

During the transaction, the special variable *current-transaction* is bound to the transaction object.

So let's look at some code:

(let ((store (persistent-store/open pathname))
       id
       vid
       (object1 '(this is a list))
       (object2 #(this is a vector))
       (object3 (list 1.0d0 -1.0d0 0.125d0 -0.125d0 1024.0d0 -1024.0d0 .3d0 -.3d0))
       (object4 (list #x87654321 #x12345678 1 -1 2147483647 -2147483648
                      #*101010101010101010101010101
                      #*100000000000000000000000000
                      #*000000000000000000000000001))
       o1id
       o2id
       o3id
       o4id)
   (call-with-transaction
    store :read-write "Save some objects."
    (lambda (transaction)
      (declare (ignore transaction))
      (setf o1id (persistent-object/save object1 store)
            o2id (persistent-object/save object2 store)
            o3id (persistent-object/save object3 store)
            o4id (persistent-object/save object4 store)))))

This code saves some simple lisp objects to the store. persistent-object/save writes the object to the store and returns a unique integer (the object-id). Obviously we'll want a :read-write (or :read-cons) transaction.

Later on, we call

       (call-with-transaction
        store :read-only "Verify objects."
        (lambda (transaction)
          (declare (ignore transaction))
          (verify
           (verify-one-return-value (verify-value-equal object1))
           (persistent-object/find store o1id))
          (verify
           (verify-one-return-value (verify-value-equalp object2))
           (persistent-object/find store o2id))
          (verify
           (verify-one-return-value (verify-value-equal object3))
           (persistent-object/find store o3id))
          (verify
           (verify-one-return-value (verify-value-equalp object4))
           (persistent-object/find store o4id))))

persistent-object/find takes the integer returned by persistent-object/save and returns the object. In this case we use a :read-only transaction, but we would use :read-write if we were going to modify any objects.

One has to be careful about equality. The object returned by persistent-object/find may not be EQ to the object saved. It would be very difficult to implement this in general, and it isn't needed in most cases. Symbols are treated specially to preserve EQ-ness.

A little lisp

Several years ago I worked on ChangeSafe, a change management and version control system written in Common Lisp. The CTO of ChangeSafe, LLC (me) has decided to open the source code of ChangeSafe.

Licensing

The code has been placed in my source code repository, which is published under the "new BSD" license. You may take any of the code and use it under that license. Do not be intimidated by the very scary words you might find. Please feel free to contact me about different licensing if that would be useful.