Hacker News new | ask | show | jobs
by strangestchild 4922 days ago
Presumably it rather depends on what you're storing such large numbers for. If you want to store a single number n (for the sake of argument, let's say a positive integer) for later reference, and have no knowledge in advance of any special properties of the number, you can't do better than ceil(log_2(n)) bits of information, whatever scheme you use for storing it.

On the other hand, if you know your number has particular properties (for example, say you knew it was an even number between 10,000 and 1,000,000 digits long) you would be able to use that information to store it more efficiently. See http://en.wikipedia.org/wiki/Information_theory for the mathematical background for treating this sort of problem.

However, many space-efficient ways of storing data are not time-efficient when it comes to retrieving it or performing operations on it. I can refer to huge numbers using very dense mathematical notation - for example, I could mention A(100,100) (see Petrushka's comment on the Ackermann function) - which would allow me to refer to an astronomically huge number in ten characters - but this is useless for most practical purposes as it would take an inordinate amount of time to calculate. Similarly, I could 'store' the quadrillionth prime in memory by simply writing "the quadrillionth prime" - but this is probably not what you wanted.

The data structure you use will depend enormously on what you want to use it for. For phone numbers, it is helpful to be able to auto-complete numbers a use is typing, which would be difficult to achieve if you simply concatenated them with separators and compressed the data. For storing words in a dictionary, you might use a trie, but that becomes a poor choice of structure if what you actually want is to be able to easily identify words that rhyme with each other.

In summary, it is important to identify what sort of numbers you are storing, what operations you intend to be able to support on your dataset, and how you intend to access the elements. In principle, for arbitrary positive integers, you can't do better than the bound I provided - and if you want more than to just store the data without using it, you may end up using rather more than that.

1 comments

Say I was trying to find the next mersenne prime what would you suggest?