Douglas Crockford
2017-02-02
Fash64 is an efficient hashing function. It crunches 64 bits at a time to produce a 64 bit result. It can be used for implementing data structures (hash tables) and checksums.
Fash64 relies on multiplication. In a single instruction, 64 bit multiply can do up to 64 left shifts and 63 additions. On most CPUs, the product of multiplication can be 128 bits, divided over two registers. Fash64 uses the high 64 bits of the product as a 64 bit right shift that can quickly be fed back into the hash. Without that massive right shift, multiply tends to lose information that spills out the left, which would make it unsuitable for hashing. But we get good feedback from that high part, yielding good hashes.
This is an implementation in a mythical language.
def prime_11 := 11111111111111111027 def prime_8 := 8888888888888888881 def prime_3 := 3333333333333333271 # The state of the hash function is kept in two variables. var result: uint64 # running result var sum: uint64 # running sum # The fash64_begin function initializes the result and sum variables. def fash64_begin() { result := prime_8 sum := prime_3 } # The fash64_word function hashes one 64 bit word. def fash64_word(word: uint64) { # The fash64_word function does the work. # It takes a 64 bit word, and scrambles it into the hash. var high: uint64 # The high part of the product var low: uint64 # The low part of the product # Mix the word with the current state of the hash # and multiply it with the big prime. high ; low := (result xor word) * prime_11 # Add the high part to the sum. This is to defend against # result equaling the word, which would cause loss of # all memory of the previously hashed stuff. sum += high # Mix the low part with the sum. result := low xor sum } # The fash64_block function hashes an array of words. def fash64_block(block: array of uint64) { block.each(fash64_word) } # The fash64_end function returns the result. def fash64_end() { return result }
Most CPUs know how to do a multiply that produces a 128 bit product, but most programming languages do not, tossing away the valuable high bits. It is tragic that practical languages do not allow a statement like
high ; low := (result xor word) * prime
that deposits the product of the multiplication into the high
and low
variables.
The sum
variable deals with the possibility of result xor
word
producing zero. If we used
result := low xor high
then result
can become zero when result
equals
word
, which loses the influence of everything that was hashed
up to this point. We mitigate this with
sum += high result := low xor sum
The sum
variable retains the influence of the earlier material,
so the hash will still be good. This borrows an idea from
Fletcher’s Checksum.
The likelihood that a word
will match the current
result
and cause a reset is 1 in 264. The
sum
makes a reset even less likely.
Use of Fash64 is pretty simple. First call fash64_begin
to
initialize the hash function. Call fash64_word
to hash each
individual word, or fash64_block
to hash a block of words.
After all of the material has been hashed, call fash64_end
to
obtain the result.
# Example fash64_begin() fash64_block(packet) fash64_word(session_check_key) packet_check := fash64_end()