Thanks! I've got some ideas to improve compression "for free", by using some information from my lzp-parser to improve the compression of short common phrases like "the", etc. Maybe i can finish this today.
Thanks! I've got some ideas to improve compression "for free", by using some information from my lzp-parser to improve the compression of short common phrases like "the", etc. Maybe i can finish this today.
M1, CMM and other resources - www.toffer86.tk
I look forward to it!![]()
Unfortuneatly i had no luck, when trying two things:
1. adaptive weighting of the models: compression improvement was not very impressive, while the speed dropped to about 70%
2. short phrase model, taken with little effort from lzp output: compression improvement was tiny, since the lzp model is tuned to only find _very_ long matches (length >= 5 in case of an order 5 model). I won't add a special lzp engine for short phrases, since compression speed will be affected!
Two more things left to try:
3. adding sse
4. replace / tune the model for lzp match lengths and add some lzp specific stuff like deterministic context handling.
People with lzp expirience please give me some tips. I've read Charls Bloom's lzp papers, looked at some lzp source code and some websites.
Thanks in advance
M1, CMM and other resources - www.toffer86.tk
Hey Christopher,
I am just adding CMM (17.09.07) to my Squeeze Chart.
This will take until sunday evening, since I still have other
benchmarks running. If you want, I can contact you then
by email to discuss results. Can you give me your
email adress?
Thanks, Stephan. I would like to move the discussion to the form, is this ok?
Btw: I'll have to stop development for about two weeksbut i will have a look at the forums meanwhile.
Greets
M1, CMM and other resources - www.toffer86.tk
Good Compressor! the possibilities to put it in OpenSource?
I'll release the source if i think i've done my best and stop developing this.
Greets
M1, CMM and other resources - www.toffer86.tk
Has anybody ever tried context mixing using _fixed_ weights with probabilites instead of counters? When weighting using counters there's implicit information about the "usage" of the context (how many times used), however this thing is missing with probabilities only.
I tired something like the above, but with poor results
I'm playing a bit with some ideas, since using probabilties instead of counters i can make my compressor more memory efficient and faster (since division can be avoided).
M1, CMM and other resources - www.toffer86.tk
most probably you use wrong formula. please give your formula for weigthing (for both counter based and probability based methods).Originally Posted by toffer
try using states (eg. copy state tables from lpaq or paq) instead of probabilities, then obtain real probabilities from states using sse (aka apm) with a small context (eg. three bits or so). finally you can mix those probabilities from all contexts. im using similar method in my tarsalzp to compute lzp match probability (my lzp coder outputs only 1- byte matches).
additionally states (from lpaq/ paq) do contain some information about context usage - for example you have full bit history for contexts seen no more than four times.
In the future i will change my current scheme:
bit counters->mixing (using exponential weights, power of two)->probability
I used polynomal weights, for order n: n^2 (suggested in paq papers). On my testset exponential weights (base 2) work better: 2^n
Let n0_k and n1_k be the counter values of order k in the current context, w_k the weight and b the bit to be coded; my scheme does:
pb = (sum from k=0 to N-1 nb_k*w_k) / (sum from k=0 to N-1 (n0_k+n1_k)*w_k )
This is what my aim is:
bit history->probability->weighting&mixing->sse->proba bility
Note that i won't use adaptive weighting (has proven to be too slow!). As i said i did some experiments using a probability based scheme:
pb = sum from k=0 to N-1 p_k*w_k
with poor results. This works ok for orders 1-3, but moving on higher orders gives nearly no compression improvement. Maybe this is a result of the bitmodels used, p_k is updated based on a hit counter and the error.
Btw: If you use a state-like approach (like in paq >= 7) you are able to represent non stationary behaviour (represent in the means of a bit history) as long, as you maintain some information about the age of zeros and ones (or the pattern you received). Moving on to longer histories discards this possibility, since you only repesent the counts of one and zero (like Matt said at some state only the value of the last bit is represented (along with counts, of course)), ignoring the pattern.
I won't copy anything, since i want this to be my own work![]()
M1, CMM and other resources - www.toffer86.tk
The same Christian saidOriginally Posted by toffer
![]()
any good compressor is a collective work. unless you are unmatched genieOriginally Posted by toffer
![]()
Of course LZP was invented by Charles Bloom, but haven't you written "your own sutff" not just "copy and pasted" his source?! That is what I meant.
Bye, i have to go to work now![]()
M1, CMM and other resources - www.toffer86.tk
me? never. ive used algorithm, written by Ilya Grebnov and Dmitry Shkarin [http://www.compression.ru/ds/lzp.rar]Originally Posted by toffer
Sorry, this was not meant in that explicit way (concerning lzp), what i intended to say is, that if you implement something you write it from scratch and understand everything, nut just copy and paste.
(the lzp thing was just an example)
M1, CMM and other resources - www.toffer86.tk
The first weighting scheme is what I did in paq1 with weights n^2, then later made adaptive through paq6. In paq7-8 I looked at the weights learned by the mixer and they are roughly equal, just a little higher for the longer contexts. The difference is that the predictions are mixed in the logistic domain so you haveOriginally Posted by toffer
pb = squash(sum from k=0..N-1 stretch(p_k)*w_k)
where stretch() and squash() are inverses of each other:
stretch(p) = log(p/(1-p))
squash(x) = 1/(1+exp(-x))
This effectively gives greater weight to predictions near 0 or 1, so extra weighting is less critical.
Also there is no division. stretch() and squash() are 4K lookup tables. Mapping bit histories to probabilities are also table lookups where the entry is adjusted proportional to the error.
In lpaq1 and paq8o5 the new StateMap (that inputs a bit history and outputs p_k) uses a counter so that the initial adjustments are large, so if you have n0 zeros and n1 ones, the output is (n1+d)/(n0+n1+2d) for d about 1/2. However there is again no division because the constants 1/(n0+n1+2d) are stored in a table indexed by a 10 bit counter (but 7-8 bits are enough).
I thought the sigmoid-transform is the output function of an artificial neuron (ok, one of many)? When i experimented with NNs (not in compression), i almost always used this as output function.
"stretch" somehow looks like tan(x), shifted on the x-axis, if i remember correctly:
...............|
............../
............/
--------*---------> x
........./
......./
....|
(of course smoother).
The higher weight result from the high gradient, when y goes to +-infinity? (In other words the nearer you go to the asymptots (? correct word) of x, the heavier the weight is, due to the transform) ?
A simple answer will justify me![]()
M1, CMM and other resources - www.toffer86.tk
That is true. Both tan(p) and log(p/(1-p)) get very large near the extremes. Both inverses are common squashing functions in neural networks. But 1/(1+exp(-x)) has a very nice property. Normally in back propagation the ideal weight update for reducing the root mean square error (RMSE) between expected output y and actual output p with this squashing function is Lx(y-p)(p)(1-p) where L is the learning rate (about .001 to .005) and x is the input. But for compression, you don't want to minimize RMSE. You want to minimize the coding cost of y given probability p, which is log(1/abs(y-(1-p))). When you take the derivative with respect to the weight vector, you get a nice simple weight update of just Lx(y-p) dropping the terms p(1-p).
This doesn't prove it is the best mixing function though. For example, suppose X, A, and B are random events and probability p(X) = 0.5, p(X|A) = 0.9, and p(X|B) = 0.3. What is p(X|A,B)? You might guess around 0.7 or 0.8, but probability theory doesn't say. The neural network gives you an answer close to what you would guess if A and B are inputs and X is the output.
Hi everyone!
Here is cmm's successor:
http://freenet-homepage.de/toffer_86/cmm2_09122007 .7z
I'll add a description tomorrow, since i'll go to bed now. I'm looking forward to discussions.
N8
M1, CMM and other resources - www.toffer86.tk
Intel Core duo 2 E6600
Enwik8-> 23.477.008 (Only 30 MB!!) 757.1 kB/s
SFC TEST->12.030.899 in 67,375 sec.
Comment: slow but great compressor!!
Thanks Chris!
The large speed hit is caused by adaptive weighting, alltogether i need 2 divisions per bit encoding. With static weighting speed nearly dobuels! I will replace the weighting c code with mmx asm, i've "vectorized" lots of stuff, so this will definitely give a descent improvement. Also note, that i will add an lzp layer for very long matches (like cmm1) and something like Matt's match model. This will be very memory efficient, since i store this stuff in the context mixing hashing table, using some tricks! The code is already there, but not complete. This version has one nice feature: full context mixing of an extended 9 bit alphabet, but this is left untouched, till now.
I couldn't get sse to work though(compression slightly dropps)
This version is rewritten from scratch, it uses a home-brewn state machine for bit histories:
context->bit histories->probabilities->weighting [ -> sse, to do... ] -> encoding
My aim is to reach around 20mb for enwik9 and 11mb for SFC.
I'll post the source for the state table and cmm1's source this afternoon.
Can anybody give me some advice about sse. I've read some papers, implemented it, but it doesn't work correctly. I can't see any diffrence to PAQ, i compared my code. Maybe i'll copy and paste PAQ's and see, whats going on?! (just to see, what i missed).
I noticed that this context mixing implementation seems to be _VERY_ memory efficient. Raising memory to ~800mb only improves compression by .05bpc on my teset (a mixture of SFC, enwik and some other stuff).
Thanks in advance
M1, CMM and other resources - www.toffer86.tk
Thanks!![]()
in theory with 2gb of memory cmm2 could be superior to ccm in compression!
Hi!
Sorry, it took me some time to reorder the sources. There's some dead code left, i didn't clean out.
Note that this one isn't compatible to my last cmm1 release, since it uses shared sources, which changed over the time (due to other projects). I tested it with SFC and got:
last release: 12.302.852
compiled sources (this release): 12.299.940
Maybe someone can use the sources/or extend them. I'll continue to work on cmm2.
BTW: I tried PAQ's SSE along with cmm2. My implementation did exactly the same - so in this case SSE HURTS compression! Strange - isn't it?!
Best regards & GN8 again
http://freenet-homepage.de/toffer_86/cmm1-src.7z
M1, CMM and other resources - www.toffer86.tk
Hi again!
I hope your xmas was ok
I spent some spare time on minor mixing optimizations and splitted modelling and coding (a predictor class - like paq). This caused some speed loss but will prevent me from going crazy (writing *about* the same code twice introduces some nice errors :sadand i increased memory to see how well the main compression engine does, when there are few hashing collisions.
http://freenet-homepage.de/toffer_86/cmm2_27122007.7z
M1, CMM and other resources - www.toffer86.tk
Intel Core duo 2 E6600 2GB RAM
max 320 MB of memory
SFC Test -> 11.834.388 in 67,062 sec.
ENWIK8 -> 22.795.730 in 113,891 sec.
Hi Toffer!
Thanks Chris!![]()
Hi!
For short: here is an improved version (have a look at the other thread), probably the last cmm2 release before i'll finish the lzp layer and proceed to version 3.
Awaiting result & comments, thanks in advance! N8
http://freenet-homepage.de/toffer_86/cmm2_09012008.7z
M1, CMM and other resources - www.toffer86.tk