Abstract Heresies: September 2013

Monday, September 23, 2013

Putting it all together

A ChangeSafe repository is implemented as a transient wrapper object around a persistent object. The wrapper object caches some immutable metadata. You'd hate to have to run a transaction in the middle of the print function in order to print the repository name. The wrapper also contains metadata associated with the backing store that the repository is using.

Oh yeah, there is something interesting going on in the wrapper, We keep track of the ongoing transactions by mapping the user-id to a list of transaction contexts (every nested transaction by a user "pushes" a new txn-context).

Anyway, it's the repository-persistent-information that has the interesting stuff:

(defclass repository-persistent-information ()
  (
   (type  :initarg :type
          :initform (error "Required initarg :type omitted.")
          :reader repository-persistent-information/type
          :type repository-type)

   ;; Database parent is the root extent for an extent database, or the master database for a satellite.
   ;; Root extents or master repositories won't have a parent
   (parent-repository :initarg :parent-repository
                      :initform nil
                      :reader repository-persistent-information/parent
                      :type (optional relative-pathname))

   ;; Satellite repositories is non-nil only for master repositories.
   (satellite-repositories :initform nil
                           :initarg :satellite-repositories
                           :accessor repository-persistent-information/satellite-repositories)

   (canonical-class-dictionary :initform (make-instance 'canonical-class-dictionary)
                               :reader repository-persistent-information/canonical-class-dictionary)
   (cid-master-table :initform (make-instance 'cid-master-table)
                     :reader repository-persistent-information/cid-master-table)
   (root-mapper  :initarg :root-mapper
                 :initform (error "Required initarg :root-mapper omitted.")
                 :reader repository-persistent-information/root-mapper)
   (cid-mapper   :initarg :cid-mapper
                 :initform (error "Required initarg :cid-mapper omitted.")
                 :reader repository-persistent-information/cid-mapper)
   (local-mapper :initarg :local-mapper
                 :initform (error "Required initarg :local-mapper omitted.")
                 :reader repository-persistent-information/local-mapper)
   (locally-named-roots :initarg :locally-named-roots
                        :initform (error "Required initarg :locally-named-roots omitted.")
                        :reader repository-persistent-information/locally-named-roots)
   (anonymous-user :initarg :anonymous-user
                   :initform nil
                   :reader repository-persistent-information/anonymous-user))
  (:default-initargs :node-id +object-id-of-root+)  ;; force this to always be the root object.
  (:documentation "Persistent information describing a repositiory, and stored in the repository")
  (:metaclass persistent-standard-class)
  (:schema-version 0))

The repository-type is just a keyword:

(defconstant *repository-types* '(:basic :master :satellite :transport :extent :workspace)
  "Type of repositories.  Note that all but :EXTENT types of repositories
   serve as root extents for databases which have multiple extents, and therefore imply extent.")

The parent-repository and the
satellite-repositories are for juggling multiple "satellite" repositories for holding particular subsets of changes (for, say, geographically distributing the servers for different product groups).

The canonical-class-dictionary is an intern table for objects.

The cid-master-table is (logically) the collection of audit-records. A CID (after change id) is represented as an integer index into the master table.

The root-mapper is a mapping table from distributed identifiers to objects.

The cid-mapper is a mapping table from the distributed identifier that represents the CID to the integer index of that CID in the master table. It is a subtable of the local mapper.

The local-mapper is submapping of the root-mapping, but a supermapping of the cid-mapper.

The locally-named-rootsis a hash table for storing the root objects of the repository.

Finally, there is the anonymous-user slot, which is the user id assigned for bootstrapping.

And all this crap is in support of this procedure:

(defun call-with-repository-transaction (&key repository 
                                              transaction-type
                                              user-id-specifier
                                              reason

                                              ;; generally, you only want to specify these two
                                              meta-cid-set-specifier
                                              cid-set-specifier 
                                              ;; but if you are doing a comparison,
                                              ;; specify these as well
                                              aux-meta-cid-set-specifier
                                              aux-cid-set-specifier 

                                              receiver)
  (check-type user-id-specifier (or keyword distributed-identifier))
  (check-type transaction-type repository-transaction-type)
  (check-type reason string)
  ;; implementation omitted for brevity, ha ha
  )

Naturally we need to specify the :repository, the :transaction-type is one of

(defconstant *repository-transaction-types* '(:read-only
                                              :read-write
                                              :read-cons
                                              :read-only-compare
                                              :read-cons-nonversioned
                                              :read-only-nonversioned
                                              :read-write-nonversioned))

The :user-id-specifier should be a distributed-identifier of a core-user instance.

The :reason is a human readable string describing the transaction.

The :meta-cid-set-specifier is mumble, mumble... just a sec...

The :cid-set-specifier is how you specify which CIDs will form the basis view for the transaction. We allow this to be a procedure that returns a cid-set object, and we will call this procedure as we are setting up the transaction and use the :meta-cid-set-specifier to specify the CIDs to form the versioned view the procedure will see.

The :meta-cid-set-specifier can be the symbol :latest-metaversion, a timestamp, or a cid-set. :latest-metaversion means to use all CIDS while resolving the :cid-set-specifier, a timestamp is useful for rewinding the world, and the main use for using an explicit cid-set is for synchronizing views between master and satellite repositories.

The :receiver is invoked within the dynamic extent of a transaction. It is passed a core-txn object that contains the metadata associated with the transaction.

The ChangeSafe core components are the repository that holds changes and associated meta-information, and simple versioned CLOS objects. It is only useful as a foundation layer, though.

Next up, another level of abstraction...

Tuesday, September 10, 2013

Mix in a little CLOS

The obvious idea here is to make CLOS objects where the slots are implemented as versioned value objects. Then we override slot-value-using-class. You might consider this a stupid CLOS trick. You could just as well establish an abstraction layer through other means, but the point is to create an understandable abstraction model. It is easy to understand what is going to happen if we override slot-value-using-class.

We use the MOP to create a new kind of slot so that we can compose values on the fly when the programmer calls slot-value-using-class. We also override (setf slot-value-using-class) so that it calls the "diff" computing code. Again, the point is to make it easy to understand what is happening.

The end result is the versioned-standard-object. An instance of a versioned-standard-object (or any of it's inheritors, naturally), has all its slots implemented versioned value objects. The programmer should specify versioned-standard-class as the metaclass in the class definition.

(defclass test-class ()
  ((nvi-slot  :version-technique :nonlogged
              :accessor test-class/nvi-slot)
   (lnvi-slot :version-technique :logged
              :accessor test-class/lnvi-slot)
   (svi-slot  :version-technique :scalar
              :accessor test-class/svi-slot))
  (:metaclass versioned-standard-class)
  (:schema-version 0))

In this example, the test class has some of the different kinds of versioned values that are named by the version technique. A :nonlogged slot is the "escape mechanism". It's a fancy name for "Just turn off the versioning, and use this here value."

A :logged slot is less drastic. There's no versioning behavior, it's just a persistent slot, but we'll keep a list of the transactions that modified it.

Finally, the :scalar version technique is one where the last chronologically participating change has the value.

A versioned slot using the :composite-sequence uses a set of diffs to represent the versioned slot value, and these are composed as described in an earlier post.

(defclass test-cvi-class ()
  ((cvi-slot-a :version-technique :composite-sequence
               :accessor test-cvi-class/cvi-slot-a)
   (cvi-slot-b :version-technique :composite-sequence)
   (cvi-slot-c :version-technique :composite-sequence :initform nil))
  (:metaclass versioned-standard-class)
  (:schema-version 0))

Once this is working, we have what we need to bootstrap the rest of the versioned storage.

Monday, September 2, 2013

Putting things together

Ok, so we have audit records, a persistent store, and "diffs". Let's start putting them together. Naturally, we are going to keep the audit records in the persistent store, and we'll put the diffs in the audit records.

A versioned value is the abstraction we're aiming for. We're going to create a versioned value by combining the information held in the audit records. If the information is a set of insertion and deletion records, we combine them as I described in the previous posts.

What makes this interesting is that we can specify a subset of the audit records to participate in the construction of the value. We can extract the versioned value as it appeared at any point in time by specfying only those records that have a timestamp at or before that point. We can also synthesize interesting views by omitting some of the records.

We're going to store a lot of these versioned values and we'll use many of them every time we access the store. To get any kind of coherent view of the world, we want to use a single set of audit records when we view these values. But programmers, being who they are, won't want to think about this. So here's what we'll do: we already have transactions in order to talk to the store; we'll add a field to the transaction that specifies the audit records to be used during that transaction. Pretty simple. You want to look at the world as it was on July 4th, you start the transaction with those audit records dated July 4th or earlier and use that set for every versioned value that you want to look at. It would be crazy to look at some objects as if it were July 4th, but others as if it were December. (Heh, but on occasion....)

There is another reason we want to specify a set of audit records at the beginning of a transaction: we need to know the baseline that we compute our diffs against. When we do a read/write transaction we're going to modify some of our versioned values. When the transaction ends, we need to compute the diffs of the things we modified. We compute the diffs relative to the view we established at the beginning of the transaction.

So we need to modify our audit records to record the basis set of records we use when we begin a transaction. We modify the transaction API to require the programmer to specify which basis set of records are to be used for the transaction and we use that basis set for computing the diffs at the end of read/write transactions.

There is an interesting side effect of this change. Suppose we have some way of attaching a label to the transactions, and some transactions only use label 'A' and others only use label 'B'. Further transaction using label 'A' only see diffs relative to prior 'A' versions, while the 'B' transactions only see the 'B' diffs. The result is that a single versioned value can hold two completely different histories.