Friday, August 16, 2013

Plus ça change

Let's start with a list:
'(a b c)
We change it:
'(a b c d)
We change the result:
'(0 a b c d)
etc.
'(0 a b c)
etc.
'(a b c)
and now a big change:
'(-1 0 1 a a1 b b1 c d e f)
At each step we compute the indels for that step. If we so desire, we can reconstruct any of the intermediate lists by starting at the beginning and applying the appropriate indels in order.

But what if we skip some? We end up with a mess. Each set of indels is computed relative to the application
of the prior indels. If an indel is omitted, the indices of the subsequent indels will be wrong.

To solve this, we have to change the representation of our sequence (that is, we won't be modifying a list).
Instead, we'll represent our sequence as an ordered set of list segments that we'll concatenate. Insertion is easy — just add a segment. Deletion is slightly harder because we don't want to cross segment boundaries.

Reconstruction of an intermediate list requires more work, but we gain flexibility. We can apply a subset of the insertions and deletions and still get a meaningful result. For example, the first change we made was to create a list with the three elements '(a b c). The second change appended a 'd'. What if we apply the second change but omit the first? We append a 'd' to an empty sequence and get '(d).

How about if we apply only change 3? That inserts a '0' at the beginning giving us '(0).

If we apply change 2 and 3, still omitting 1, we get '(0 d).

That last change is tricky. We've deleted the 'a and prepended '(-1 0 1 a a1), and deleted the 'c and inserted '(b1 c d e f). If we omit change 4 and 5 (which delete the leading 0 and trailing 'd) we'll get '(-1 0 1 a a1 0 b b1 c d e f d). We preserve the order between different insertion points, so the inserted 0 is always before the inserted 'd, but we resolve ambiguous insertions by placing the later insertions before the earlier ones if they are in the same place.

Optional Exercise 7 (quite hard): Implement this as well.

Tuesday, August 13, 2013

Let's start with a list:
'(a b c d e f g h i j)

We'll add some elements:
'(a b c d e f g h i j k l m n o)

Delete some:
'(a b c d e i j k l m n o)

Add some more:
'(x y z a b c d e i j k l m n o)

Maybe delete one:
'(x y z b c d e i j k l m n o)

We want to compare the original sequence with the end sequence and
summarize the difference with a set of "indels", which specify the
indexes at where elements were inserted or deleted:

(diff '(a b c d e f g h i j) '(x y z b c d e i j k l m n o))

(#S(INDEL
    :LEFT-SUBSEQ-START 0         ;; 'a'
    :LEFT-SUBSEQ-LIMIT 1
    :RIGHT-SUBSEQ-START 0        ;; 'x y z'
    :RIGHT-SUBSEQ-LIMIT 3)
 #S(INDEL
    :LEFT-SUBSEQ-START 5         ;; 'f g h'
    :LEFT-SUBSEQ-LIMIT 8
    :RIGHT-SUBSEQ-START 7        ;; nothing
    :RIGHT-SUBSEQ-LIMIT 7)
 #S(INDEL
    :LEFT-SUBSEQ-START 10        ;; nothing
    :LEFT-SUBSEQ-LIMIT 10
    :RIGHT-SUBSEQ-START 9        ;; 'k l m n o'
    :RIGHT-SUBSEQ-LIMIT 14))
Exercise 6 (hard): Implement this.

For an extreme challenge, implement this in a way that is not hopelessly inefficient.