Saturday, January 18, 2014

A better hash table: Clang

After Microsoft Visual Studio 2012 and GCC 4.8.2, we now provide the results of our performance test suite for C++ unordered associative containers in Clang 3.4. All the tests were compiled and run by Thomas Goyne on a Mac OS X 10.9.1 box with an Intel Core i7-3720QM @2.6GHz, Turbo Boost and SpeedStep disabled, release settings -O3 -std=c++11.
Clang can work with two different standard libraries, libstdc++-v3 and libc++: we have tried both libstdc++ 4.8.2 and libc++ 3.4.
libstdc++-v3
Insertion without rehashing, unique elements.
Insertion with rehashing, unique elements.
Insertion without rehashing, duplicate elements, Fmax = 1, G = 5.
Insertion without rehashing, duplicate elements, Fmax = 5, G = 5.
Insertion with rehashing, duplicate elements, Fmax = 1, G = 5.
Insertion with rehashing, duplicate elements, Fmax = 5, G = 5.
Erasure, unique elements.
Erasure, duplicate elements, Fmax = 1, G = 5.
Erasure, duplicate elements, Fmax = 5, G = 5.
Successful lookup, unique elements.
Unsuccessful lookup, unique elements.
Successful lookup, duplicate elements, Fmax = 1, G = 5.
Unsuccessful lookup, duplicate elements, Fmax = 1, G = 5.
Successful lookup, duplicate elements, Fmax = 5, G = 5.
Unsuccessful lookup, duplicate elements, Fmax = 5, G = 5.
As expected, the results are very similar to those obtained with GCC: the only noteworthy difference is Boost.Unordered performance in lookup with unique elements, which is somewhat poorer here —the reasons for this discrepancy are not clear to me.
libc++
Insertion without rehashing, unique elements.
Insertion with rehashing, unique elements.
Insertion without rehashing, duplicate elements, Fmax = 1, G = 5.
Insertion without rehashing, duplicate elements, Fmax = 5, G = 5.
Insertion with rehashing, duplicate elements, Fmax = 1, G = 5.
Insertion with rehashing, duplicate elements, Fmax = 5, G = 5.
Erasure, unique elements.
Erasure, duplicate elements, Fmax = 1, G = 5.
Erasure, duplicate elements, Fmax = 5, G = 5.
Successful lookup, unique elements.
Unsuccessful lookup, unique elements.
Successful lookup, duplicate elements, Fmax = 1, G = 5.
Unsuccessful lookup, duplicate elements, Fmax = 1, G = 5.
Successful lookup, duplicate elements, Fmax = 5, G = 5.
Unsuccessful lookup, duplicate elements, Fmax = 5, G = 5.
In general, the unordered associative containers provided by libc++ are considerably slower than those in libstdc++-v3:
  • When duplicate elements are allowed, insertion of a new element in libc++ takes place at the end of the range of equivalent elements (in libstdc++ -v3 it is at the beginning), which incurs the extra cost of traversing the range.
  • Erasure is unnecessarily expensive: to determine whether a given node p is located at the beginning of its bucket b, libc++ does a hash→bucket mapping on the predecessor prev of x and checks if the resulting bucket is not the same as b, when it suffices to verify if b points to prev.
  • It is harder to explain why lookup (specially with unique elements) is so much slower than in libstdc++-v3: code inspection reveals that the hash→bucket mapping is less efficient in libc++ (it involves one extra logical and operation), but this fact alone seems unlikely to account for the big differences in performance.

3 comments :

daveab said...

I would like the benchmarking code; as I am working on special case (very large data-sets) hash-map implementation.

Dave

Joaquín M López Muñoz said...

The links are already available scattered across the different blog entries I devoted to this. For your convenience:

unique_running_insertion.cpp
non_unique_running_insertion.cpp
unique_scattered_erasure.cpp
non_unique_scattered_erasure.cpp
unique_scattered_lookup.cpp
non_unique_scattered_lookup.cpp

daveab said...

Thanks for these links.
Dave