Number Complexity

Mon, 22 Oct 2007

One of the ways of measuring the amount of information in something is by finding the shortest description of that something. For example, the sequence "183736353" has more information than "111111111" by most definitions. The reason that I say "most definitions" is because the language you use to express the description matters. If this language had a really short word for "the sequence 183736353" and a medium length word for "1", then it is the other way around. So to talk about complexity we need to have a definition of the language we are allowed to use. I believe this is called Kolmogorov complexity.

So I thought that it would be good to see what the complexity of a few numbers was, to see how it grows as the numbers get bigger. The description language I used was simple arithmetic (I think most people doing Kolmogorov complexity use simple programming languages, which are much more powerful than arithmetic) and the number "1". So we have the operations plus, minus, multiply and divide. For simplicity, all divisions must have integer results (though I never reached a stage where division was useful). The operations can be done in any order (so you can have free brackets), but you can't ram two 1s together to form an 11. I wrote a program which would try lots of these expressions, in order of increasing length, and found the shortest expression for each number.

Unfortunately, when working in a garbage collected language, you can write memory leaks without realising. So after I had left the program running for a while and came back, the program was using 500MB of memory, and wanting more, so lots of things were being pushed into swap space and basically it was really inefficient. So I didn't get quite as much of the results I was hoping for, but I got enough to draw a graph (after filling in a few blanks with what seem like the right numbers):

Kolomogorov complexity of integers <= 40

(the vertical axis is just the number of 1s needed, since each operator is binary, each expression must have exactly one more '1' than it does operators, so this number easily maps to the length of the whole expression)

There is also a line of best fit in there, which I think fits the data pretty well, and it feels like the right type of function (anything to do with information content tends to have logs in it): 3.233 * log(n) - 0.8701. So if I was to estimate the number of 1s needed to write out my phone number (an 8 digit number), it would be 59. But this estimate might get a bit wrong as we get higher, since all of the expressions my program found just used addition and multiplication. Once you get high enough that subtraction becomes more useful, the curve might take a different shape. I'd be interested to see if there is a number where the shortest (or even one of the shortest) expression which yields it uses division. I suspect that there is (with the expression of the form: ((some large but 'simple' number) + (some simple number)) / (some simple number). I suspect that it would be quite a large number if it exists.

This is an interesting subject, but it is very context-dependent. You can't accurately measure the complexity of something without either choosing an encoding scheme or viewing it in the context of a complete data set.

For example, phone numbers are generally not uniformly distributed across the entire range (which in your case is .00000000-99999999). This means that phone numbers are easier to predict than an arbitrary set of 8-digit numbers.

Effectively, what you've measured is the compactness of a lossless encoding scheme. It would be interesting to see how that compares to various other encoding schemes for the same dataset. Check out Shannon Entropy and Information Theory for more info.
13 days after story
Name & email are optional. Email will not be obfuscated.
HTML tags will be removed except hyperlinks.
 

About

I'm a nerd living in Sydney. This is a place where I can write stuff about my interests and not care that no one else is reading.

I like music, maths, programming, pretty pictures, filters and other good things.

(more info)

It should be fairly obvious that this isn't connected to my employer at all.

Email me (not a catchpa)

Email policy

Subscribe

RSS Feed RSS

Get an aggregator

Liferea (Linux)

Vienna (OSX)

Feedreader (Windows)

Google Reader (Web based)

I've only used Liferea, so I can't vouch for the other ones.

About this site

This site runs a (modified) version of blosxom.

The host is GeekISP, and they seem to do an excellent job.