Hacker News new | ask | show | jobs
by whalesalad 811 days ago
I tend to end up encoding everything as an integer (multiply by 1000, 10000 etc) and then turn it back into a float/decimal on decode. For instance if I am building a system dealing with dollar amounts I will store cent amounts everywhere, communicate cent amounts over the wire, etc. then treat it as a presentation concern to render it as a dollar amount.
8 comments

It's worth bearing in mind when you do that that the largest integer that is "generally safe" in JSON is 2^53-1, so if you scale by a factor of 10000 you're taking 13-14 more bits off that maximum. That leaves you about 2^40, or about a trillion, before you may start losing precision or seeing systems disagree about the decoded values. Whether that's a problem depends on your domain.
For money, that's a sane setup.

But do note, that in currency, there are multiple, actively used currencies that have zero, three, five (rare) or even eight (BTC) decimals. That some decimals cannot be divided by all numbers (e.g. only 0.5)

Point being: floats are dangerously naive for currency. But integers are naive too. You'll most probably want a "currency" or "money" type. Some Value Object, or even Domain Model.

XML offered all this, but in JSON there's little to convey this, other than some nested "object" with at least the decimal amt (as int), and the ISO4217 currency. And maybe -depending on how HATEOAS you wanna be- a formatting string to be used in locales, a rule on divisibility and/or how many decimal places your int or decimal might be.

(FWIW, I built backbends for financial systems and apps. It gets worse than this if you do math on the currencies. Some legislatioins or bookkeeping rules state that calculation uses more or less decimals. E.g. that ($10/3)*3 == $10 vs == $9.99. or that $0.03/2 == 0.1 + 0.2, e.g. when splitting a bill. This stuff is complex, but real domain logic)

When I say dangerously naive, I mean in a way that people can go to jail¹ for "loosing" or "inventing" cents. Which your software will do if you use floats.

¹IANAL. But this was told when legal people looked at our architecture.

Your software will still "lose" cents if you use integers, for operations such as dividing a bill (e.g. divide by 3), or applying 3% APR in monthly increments.

The goal is not to avoid rounding errors (which would be quite difficult when the true account value can be an irrational number, as with 3% APR compounding monthly), but to have the exact same rounding errors that are prescribed by the accounting practices. Which may vary depending on legislation.

A decimal floating point is usually a better starting point than integers are.

> for operations such as dividing a bill (e.g. divide by 3), or applying 3% APR in monthly increments.

Which is why passing around ints is not the solution. And why I specifically mention Domain Models and/or Value Object.

A domain model would throw an exeption or otherwise dissalow certain divisions for example. What I often do, is something like `expense.amount.divide_over(3, leftover_to_last)` or `savings.balance_at(today).percentage_of(3.1337)`.

Sometimes, in simpler setups and when the language allows, I'll re-implement operators like *, / and even + and -. But when actual business logic is needed, I'll avoid these and implement actual domain methods that use the language the business uses.

But never, ever, do I allow just math-ing over the inner values.

So, I disagree: Both decimal floating point and integers are just as "bad". Maybe for the inner values in the domain model or value object, they are fine, but often there integers are a slightly better starting point because they make rounding and leftovers very explicit.

The problem with that (which I have seen in practice) is that you are essentially hard coding the maximum precision you will accept for every client that needs to interpret your JSON.

For example, you say you store monetary amounts as cents. What if you needed to store US gas prices, which are normally priced in amounts ending in 9/10ths of a cent? If you want to keep your values as integers you need to change your precision, which will likely mess up a lot of your code.

and different currencies have different default precisions. So if you're dealing with multiple currencies, now you need both client and server to have a map of all currency precisions for formatting purposes that they agree on.

What's worse is that these things can also change over time and there is sometimes disagreement over what the canonical value is.

E.g. ISO 4217 (used by Safari, Firefox and NodeJS) will say that the Indonesian Rupiah (IDR) uses 2 decimal digits, while Unicode CLDR (used by Chrome) will say that they use 0 decimal digits. The former is the more "legalistic" definition, while the latter matches how people use the currency in reality.

This is not a real issue if you transfer amounts as decimal strings and then pass those to the Intl API for formatting (the formatting will just be different but still correct), but it's catastrophic if you use scaled-up integers (all amounts will be off by magnitudes).

For this reason I would always store currency amounts in an appropriate DECIMAL type in the DB and send currency amounts as strings over the wire.

This is a good point.

It's not widely known, but US gasoline prices are actually in a defined currency unit, the mill (https://en.m.wikipedia.org/wiki/Mill_(currency)).

For most purposes, using mills as the base unit would be sufficient resolution.

So basically you use fixpoint numbers. Especially for currency that’s a very good idea anyway, because of rounding errors, even more so in IEEE 754
Pedantically, IEEE 754 defines decimal floating point formats (like decimal128) which are appropriate for representing currency. Representing currency in non-integer values in any of the binary floating point formats is indeed a recipe for disaster though.
I have tried to encode all non-trivial numbers as strings. If it's too big (or small), or if it's a float, I'll have to change my JSON schema. Bake the need to decode numbers into the transforms for consistency.
This is great as long as you always make clear which value is pre post encoding. I remember one of my first production bugs was giving users 100 times the credit they actually bought. Oops.
Makes sense for dollars, but for anything like graphics or physics I'd consider a power of two like 1,024 as the fixed-point factor instead.

My intuition tells me that "x * 1000 / 1000 == x" might not be true for all numbers if you're using floats.

A sure sign of an inexperienced programmer in numerical computing is when they check for equality to zero of a floating-point number as

if (x == 0) ...

instead of something like

if (abs(x) < eps) ...

where eps is a suitably defined small number.

Sometimes it is fine. For example, reference BLAS will check if the input scalars in DGEMM are exactly zero, for

    C <- alpha*AB + beta*C 
If beta is exactly 0, you don’t have to read C, just write to it.

The key here is that beta is likely to be an exact value that is entered as a constant, and detecting it allows for a worthwhile optimization.

I would guess even most of time people using epsilon don't understand it. Its not like there is universal constant error with floating point numbers. I feel that saying just use epsilon is not much better than x == 0 and could be harder to find bugs if it sometimes works and othertimes does not.
I think funny enough a sure sign of an inexperienced programmer in bigco application programming is the other way around, that they wrongly learn a metal model of "floating point is approximate, never ever do ==" in school.
I often store it as smaller than cents, because anything with division or a basket of summed parts with taxes can start to get funky if you round down (and some places have laws about that.)