Sunday, June 28, 2015

Traversing a linearized tree: cache friendliness analysis

We do a formal analysis of the cache-related behavior of traversing a levelorder_vector, a linear array of elements laid out according to the breadth-first order of a binary tree. Ignoring roundoff effects, let us assume that the array has n = 2d − 1 elements and the CPU provides a cache system of N lines capable of holding L contiguous array elements each. We further assume that N > d.
Under these conditions, the elements of level i, being contiguous, are paged to 2i/L lines.
The traversal of  the array corresponds to the post-order sequence of the tree implicitly encoded, and it is easy to see that this meets the elements of each level i exactly in the order they are laid out in memory: so, given that  N > d, i.e. the CPU can hold at least as many lines in cache as there are levels in the tree, an optimal caching scheduler will result in only 2i/L misses per level, or n/L overall. The number of cache misses per element during traversal is then constant and equal to 1/L, which is the minimum attainable for any traversal scheme (like the simplest one consisting of walking linearly the array from first to last).
The questions remains of whether we can justifiably hold the assumption that N > d in modern architectures. In the Intel Core family, for instance, the data L1 cache is 32 KB (per core) and line size is 64 B, so N = 512. On the other hand, the maximum d possible in 64-bit architectures is precisely 64, therefore the condition  N > d is met indeed by a large margin.

No comments :

Post a Comment