Monday, September 2, 2013

Putting things together

Ok, so we have audit records, a persistent store, and "diffs". Let's start putting them together. Naturally, we are going to keep the audit records in the persistent store, and we'll put the diffs in the audit records.

A versioned value is the abstraction we're aiming for. We're going to create a versioned value by combining the information held in the audit records. If the information is a set of insertion and deletion records, we combine them as I described in the previous posts.

What makes this interesting is that we can specify a subset of the audit records to participate in the construction of the value. We can extract the versioned value as it appeared at any point in time by specfying only those records that have a timestamp at or before that point. We can also synthesize interesting views by omitting some of the records.

We're going to store a lot of these versioned values and we'll use many of them every time we access the store. To get any kind of coherent view of the world, we want to use a single set of audit records when we view these values. But programmers, being who they are, won't want to think about this. So here's what we'll do: we already have transactions in order to talk to the store; we'll add a field to the transaction that specifies the audit records to be used during that transaction. Pretty simple. You want to look at the world as it was on July 4th, you start the transaction with those audit records dated July 4th or earlier and use that set for every versioned value that you want to look at. It would be crazy to look at some objects as if it were July 4th, but others as if it were December. (Heh, but on occasion....)

There is another reason we want to specify a set of audit records at the beginning of a transaction: we need to know the baseline that we compute our diffs against. When we do a read/write transaction we're going to modify some of our versioned values. When the transaction ends, we need to compute the diffs of the things we modified. We compute the diffs relative to the view we established at the beginning of the transaction.

So we need to modify our audit records to record the basis set of records we use when we begin a transaction. We modify the transaction API to require the programmer to specify which basis set of records are to be used for the transaction and we use that basis set for computing the diffs at the end of read/write transactions.

There is an interesting side effect of this change. Suppose we have some way of attaching a label to the transactions, and some transactions only use label 'A' and others only use label 'B'. Further transaction using label 'A' only see diffs relative to prior 'A' versions, while the 'B' transactions only see the 'B' diffs. The result is that a single versioned value can hold two completely different histories.

No comments: