Re: Bob Jenkins' hashing implementation in Ruby
From:
Mathieu Bouchard <matju@...>
Date:
2003-03-02 21:13:38 UTC
List:
ruby-core #903
On Sat, 1 Mar 2003, Mauricio Fern疣dez wrote: > I am using the Jenkins' implementation from Perl 5.8. Moreover, I got > rid of the modulus operations (and the prime numbers) by using bitwise > AND and ensuring that num_bins is 2^n - 1 (this implies that I've had > to add 1 in a number of places, but I believe it is more important to > make FIND_ENTRY as fast as possible). The point of modulo-big-prime is that you minimize the chance that a bad hashfunction maps to a limited number of slots. With 2**n slots you _almost_ _maximize_ it! For example: (Symbol#hash % 256) == 14 so for Hash sizes up to 2**8, you are causing a 256-fold slowdown on Symbol keys. which does not quite justify the 1.05-fold speed increase... If using plain Objects as hashkeys without redefining #hash, then every result will be multiple of 4, which means a 4-fold slowdown on hashsizes multiple of 4. now theoretically hashfunctions could get the responsibility of producing good results, but this may produce adverse effects on lots of code. BTW the code once was pretty much like you did there, and it got switched to using prime numbers later (early 1.6). > 10000 30000 100000 300000 500000 1000000 > ruby-1.6.8.jenkins 0.0910 0.3198 1.2080 10.0227 7.7709 18.0764 > ruby-1.6.8.jenkins.orig 0.0907 0.3254 1.2281 10.2327 8.0457 18.3917 > ruby.jenkins 0.0857 0.2942 1.3588 4.2214 8.0908 17.9354 > ruby.jenkins.orig 0.0902 0.3133 1.4469 4.5248 8.5991 18.7564 > [BTW it seems 1.8's GC has been improved quite a bit as shown by the > great difference for 300000 iterations; great work! :) ] Because in the 1.6.8 case the result of 300000 is (significantly!) bigger than in the 500000 case, I consider the whole test to be completely bogus. You'd have to find why it fails to produce results that make sense. And THEN you'd have to make the comparison on Symbols, Fixnums, plain Objects, Arrays of whatever, etc. Pick at least three different common cases to make sure you aren't deoptimising too many things. ________________________________________________________________ Mathieu Bouchard http://artengine.ca/matju