In particular, what I'd like is a family of cumulative probability distribution functions for arithmetic coding that can be inverted in integers.

Even more specifically, the main cases are semi-laplacian or semi-gaussian functions. Precisely the goal is something like this :

Probably of a symbol is modeled like P(x) ~= e^ ( - lambda * x ) = k ^ -x (or something like that; P(0) is large, P(255) is small; x in [0,255] and) Cumulative probability is the sum of probability of all lower symbols : C(x) = Sum { y <= x } P(y) To encode x we send something in [ C(x-1) , C(x) ) We want C(x) to be scaled such that C(255) = 16384 or some other power of 2 constant We want C(x) to be integers, and we for decodability we must have C(x) >= C(x-1) + 1 That is, the integer math truncation must never make a C(x) = C(x-1) Now, to decode we get back some target number T that's in in [ C(x-1) , C(x) ) for some X We'd like to have an analytic function that gives us x directly from T : x = D( T )

Now before you say "that's impossible" ; it's obviously not. You can certainly trivially find solutions such as :

C(x) = (C(255)/256) * (x+1) D(T) = 256 * T / C(255);

The question is, are there more useful solutions, and in particular can you construct a whole family of solutions that are parameterized to give you different shapes.

Obviously you can precompute table lookups, but I'd rather not have a whole mess of tables.

I don't really know anything about integer->integer function theory; it seems to me there are two possible approaches to this. One is "constructive" ; start with simple funcionts that you know you can invert, then there are many operations you can do on them to compose or distort them and still have them invertable.

## 4 comments:

I was just thinking about invertible integer operations for hashing, so I will brain dump everything I know of from that, but I don't think any of it is useful.

It turns out that if you have multiple variables, then you can do all sorts of things where you update one variable from the others, and it's trivial to reverse:

foo += [...complicated-thing...]

or

foo ^= [...complicated-thing...]

e.g. like this:

a += b >> 5;

b ^= a << 7;

b -= (c << 4) * (a >> 11);

can easily be reversed (by subtracting or re-xoring); if you have a large basic block full of instructions like this, you can just walk backward through them one at a time applying this logic.

But, if "complicated-thing" involves *foo*, then all bets are off (and this is where cryptographic hashes go).

Consider this operation:

x += (x << 1)

This is obviously the same as 'x = x*3', so it's not surprising that it's invertible, but note the mismatch between the cost of computing it forward and the cost of computing it backward. Now consider this operation:

x ^= (x << 1)

This can still be inverted; the bottom bit of x is the original value that it used to be; the next bit up, x1, can be computed as: x1_new = x0_old ^ x1_old, so x1_old = x1_new ^ x0_old. Then each successive bit can be computed the same way. But there's no close-formed way to compute it.

And the thing is, it's not even a-priori obvious that 'x ^= x << 1' is a 1-to-1 function, nevermind how hard it turns out to be to invert.

Yeah the first thing you describe is "lifting". It's how we make the integer reversible rotations.

The second bit is more interesting to me. For example

x *= x;

is obviously invertible in ints (for x >= 1).

Hm, say you want C : [0,n] -> [0,m] with C(x) >= C(x)+1.

Recasting this as a continuous problem: is there a unit interval remapper f : [0,1] -> [0,1] such that a) f' >= n/m everywhere (guarantees the C(x) >= C(x+1)) and b) both f and f^(-1) have "nice" analytic forms. There's more you need to ensure that f^(-1) always reconstructs the right value, but b) is already a pretty tough requirement.

Scaled+translated exponential functions might work, but polynomials of higher order than quadratic are a bad idea. OTOH, the set of such functions f is convex, so once you have a few of them, you also have a whole spectrum inbetween.

Oh, and f : x |-> x^2 and any other analytic function that maps ints to ints is extra-nice in that regard since it guarantees that f^(-1) will always dequantize values back into the right bucket without any fudging around, provided the calculation is done exactly (or close enough to exact).

Post a Comment