Tuesday, April 15, 2008

Faster

In my last entry I had gotten the code working to find the second eigenvector, but it was really slow. The obvious culprit was the matrix multiply and it was clear that much could be done to speed it up. Ultimately, though, that is a waste of time.

I want to cluster some 58000 data points. The similarity matrix for this size data set would have 3,364,000,000 elements. Even if it were reasonable to try to fit this into memory, it would take a fair amount of time to perform the operations. The trick is to avoid creating the matrix in the first place.

The matrix Q is defined like this:
;; M* is matrix multiply

(define Q
  (let ((AT (M* A (transpose A))))
    (M* D R-inverse AT D-inverse)))
.
and we will be multiplying this times a vector to find the second eigenvector. But instead of composing the matrix prior to multiplication, we'll leave the matrix in its factored form. That is, if a matrix has two factors, X and Y, and you want to multiply it times a vector, then this equivalence holds:
(M*V (M* X Y) v)  <-->  (M*V X (M*V Y v))
.
So if we have a procedure that multiplies by Q:
(define do-q
  (let* ((AT (M* A (transpose A)))
         (Q (M* D R-inverse AT D-inverse)))
    (lambda (v)
      (M*V Q v))))
.
we can rewrite it as:
(define do-q
  (let* ((TA (transpose A)))
    (lambda (v)
      (M*V D
        (M*V R-inverse
          (M*V A
            (M*V TA
              (M*V D-inverse v))))))))
.
We've traded a single huge matrix multiply for a series of matrix-vector multiplies. This wouldn't normally be to our advantage, but the matrices D, R-inverse, and D-inverse are diagonal, so we can take advantage of that fact in two ways. First, since the off-diagonal elements are zero, we needn't multiply them. Second, since the off-diagonal elements are zero, we don't even need to represent them. This is a savings in memory and time because we now only need one vector ref to fetch the contents of the diagonal.

Matrix A, however, is still big, and its transpose, AT, isn't any smaller. But multiplying by a transpose on the left is the same as multiplying on the right, so we can rewrite the form like this:
(define DM*V  diagonal-matrix*vector)
(define DM*V  vector*diagonal-matrix)

(define do-q
    (lambda (v)
      (DM*V D
        (DM*V R-inverse
          (M*V A
            (V*DM 
              (M*V D-inverse v)
                A))))))
.
And now we don't need to transpose A.

Matrix A is still a big matrix, but it is only filled with zeros and ones. Most of the elements are zero. Rather than represent it in the obvious, naive way, we can represent it as a vector of rows where each row is a simple list of the columns that contain a one. Not only is this much smaller, it is much faster to multiply because we can skip the zero element.
(define (sparse-binary-matrix-times-vector matrix vector result-vector)
  (unless (= (car matrix) (vector-length vector))
    (error "Cannot multiply"))
  (unless (= (vector-length result-vector) (matrix-height (cdr matrix)))
    (error "Cannot multiply"))
  (dotimes (row (matrix-height (cdr matrix)) result-vector)
    (let ((row-elements (vector-ref (cdr matrix) row)))
      (do ((tail row-elements (cdr tail)))
          ((null? tail))
        (let ((column (car tail)))
          (vector-set! result-vector row
                       (+ (vector-ref result-vector row)
                          (vector-ref vector column))))))))
.
With these changes we have completely eliminated the matrix multiply.

No comments: