http://shelwien.googlepages.com/fpaq0pv4b.rar
http://shelwien.googlepages.com/fpaq0pv4B.htm
<div class="jscript"><pre> 48.940Mbps 48.862Mbps | 100% 100% | fpaq0p (original)
56.590Mbps 54.367Mbps | 116% 111% | fpaq0p IntelC build
56.987Mbps 49.813Mbps | 116% 102% | fpaq0pv2 (original)
64.992Mbps 53.078Mbps | 133% 109% | fpaq0pv3 (original)
77.834Mbps 67.819Mbps | 159% 139% | fpaq0pv4 GCC 4.3.0 prof-gen build
89.309Mbps 74.677Mbps | 182% 153% | fpaq0pv4nc (blockwise eof encoding)
79.103Mbps 66.729Mbps | 162% 137% | fpaq0pv5 (original)
80.067Mbps 70.099Mbps | 164% 143% | fpaq0pv5 GCC 4.3.0 prof-gen build
81.834Mbps 74.656Mbps | 167% 153% | fpaq0pv4A (IntelC)
95.188Mbps 87.459Mbps | 194% 179% | fpaq0pv4Anc (IntelC)
106.537Mbps 97.165Mbps | 218% 199% | fpaq0pv4B (IntelC, not compatible with fpaq0p)
</pre></div>
1. Weird Matt's carryless rc replaced with sh_v1m port,
which supports single-step renormalization, and also
other optimizations, like 16bit i/o (IntelC somehow
optimizes it using vectorization).
2. Classes made more readable using templates.
Well, actually not much code left from previous version.
3. Direct win32 i/o implemented (bypassing stdio),
but only visible effect seem to be the smaller executable size.
4. Preprocessors implemented for classes translation into macros -
couldn't find any other way for rc to use local variables for its
state. This gives most of speedup. Other possible way could be
adding rc vars as arguments to all the functions, but it would be
too ugly and won't guarantee that compiler will properly optimize it.

Matts coder is *VERY* simple, also its efficient. Im not sure that it is possible to do something principally simpler and more efficient. Anyway, still the catch is in modeling. FPAQ0P-like model is very fast. Maybe, with ideas like multi-threaded entropy coder we catching the fleas...
i tested mm io on windows - its faster for reading from disk cache (because we avoid memcpy) but for reading from disk its slower! probably because read calls are optimized by windows with read-ahead while mm files are supposed to be accessed in random order 