Results 1 to 11 of 11

Thread: Interesting Deflate source

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,255
    Look at this source:
    http://cm.bell-labs.com/sources/plan...late/deflate.c

    Especially, I was interested in a hash function. This implementation uses multiplicative hashing:
    <div class="jscript"><pre>
    #define hashit(c) (((ulong)(c)&0xffffff)*0x6b43a9b5)>>(32-HashLog)) </pre></div>

    The value 0x6b43a9b5 (1799596469) is not even a prime, but I tested such constant with BALZ for 3-byte hashing and it is one of the best so far.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    1,799
    Why don&#039;t you tune your hash to the data, instead of using some random values, prime or not?

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,255
    Quote Originally Posted by Shelwien
    Why dont you tune your hash to the data
    I do. I carefully test the multiplier on the real files, collecting various performance data!

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,594
    Quote Originally Posted by encode
    The value 0x6b43a9b5 (1799596469) is not even a prime,
    the value shouldnt be a prime. primeness is just some sign of rather good multipliers i think that this multiplier will be not much different to your one. actually, once ive tested your prime against ideal hashing and found a very little difference

    multiplication scheme has only one serious cabeat - highest bits of hashed value affects only highest bits of resulting hash value. smth like this:

    (x+(x>>16))*123456791

    may improve hashing quality

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    1,799
    > highest bits of hashed value
    > affects only highest bits of resulting hash value. smth like this:
    > (x+(x>>16))*123456791
    > may improve hashing quality

    (x+(x>>16)) ~= (x+x/65536) = x*(65537/65536)
    x*(65537/65536)*123456791 = x*123458674.801132
    So I&#039;m not so sure about that.

    >> Why don&#039;t you tune your hash to the data
    > I do. I carefully test the multiplier on the real files,
    > collecting various performance data!

    Somehow I think that you just manually try out and compare some
    selected hash functions/parameters.

    And I was talking about
    1. Log the memory access pattern for some dataset
    with unhashed contexts for addresses
    (reads/writes; also other parameters like value size
    and offset in hash cell)
    2. Measure the processing speed for all 2^32 multiplier values
    (or including other hash functions too), keeping some number
    of best results.
    3. Test the real compressor performance with these stored
    best hash configurations.

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,255
    Quote Originally Posted by Shelwien
    Somehow I think that you just manually try out and compare some
    selected hash functions/parameters.
    Actually, in my case the hash function is not so extremely important. I hash 24-bit value to 1M hash HEAD. Its more than enough number of hash buckets. Since I use hash chains, hash function should do good enough separation and spread all 16M values to 1M hashes - to minimize hash chain branches. I kept multiplicative hashing. I tested the multiplier in next manner - I keep only HEAD for string searching (i.e. no further traverse on PREV) and measure the compression ratio on various files.

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    1,799
    Well, I was talking about hash speed.
    And that depends mostly not on the distribution of indexes,
    but on the alignment and clustering of accesses.
    Your explanation didn&#039;t contain anything about that.

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,594
    Quote Originally Posted by Shelwien
    (x+(x>>16)) ~= (x+x/65536) = x*(65537/65536)
    x*(65537/65536)*123456791 = x*123458674.801132
    So Im not so sure about that.
    why? it solves problem of distributing influence of higher value bits among all hash bits

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    1,799
    I mean it doesn&#039;t look any different from simply changing the multiplier value.

  10. #10
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    554
    Here i really like Matt&#039;s solution, since it obviously does what you want.
    M1, CMM and other resources - www.toffer86.tk

  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    2,594
    you may cosider this as an idea of using non-integer multipliers

Similar Threads

  1. Interesting tools
    By lunaris in forum Data Compression
    Replies: 2
    Last Post: 26th August 2009, 00:50
  2. DEFLATE/zlib implementations
    By GerryB in forum Data Compression
    Replies: 10
    Last Post: 7th May 2009, 18:03
  3. deflate model for paq8?
    By kaitz in forum Data Compression
    Replies: 2
    Last Post: 6th February 2009, 22:48
  4. FreeArc is becoming more and more interesting...
    By Vacon in forum Forum Archive
    Replies: 65
    Last Post: 9th December 2007, 22:41
  5. Outdated, but maybe interesting...?
    By Vacon in forum Forum Archive
    Replies: 1
    Last Post: 20th October 2007, 21:13

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts