| HN Mirror

The idea of a type / class / object here is equivalent in that they provide locality of interpretation. There is one place that may contain the code that manipulates these values; in other words, there is one place that contains your understanding - your interpretation - of the integer.

You (i.e., all commenters in this thread somehow disputing the AccountId object version of the story) are completely right, it is difficult to decide up front what you need. Of course, depending on your domain, it may be perfeclty valid to have -5 years or 1800, but I am not asking you to never implement years like this, I am merely suggesting that it is useful to have one place where such decisions, invariants, roles - your understanding of a year, perhaps specific to your domain - are located.

How your understanding is structuring is a different concern, because of course, a class with 10000 LOC is horrible to work with. But you can always use means of abstraction, and compose independent modules. An int does everything, maybe in one context a year shoudln't be 1800, maybe formatted as 4 digits. A year class / type / object is the place to store these decisions. As I outlined, such decisions are both technical (how many bits?) but may also be domain specific (year must be > 1800). Maybe you have an int internally, but abstract it with a non-zero int.

It boils down to interpretation. An int allows for many operations, but they may be invalid (in the sense of "implausible", or "undefined"). For example, an account ID of 1 can be multiplied by 2 and then by 2 again, and so on. Integers form a relation this way, but not all extensions of this relation might be "meaningful.

Integers as ids are a great example for this. Technically, they are great, because they are simple numbers, can be typed on a keyboard, and be readily interpreted, but not like integers.

6 people two times as many than 3 people. In Germany, grades are marked 1 - 6, (1 = very good, 6 = very poor). But the relationship is merely ordinal. A grade of 4 is not "twice as bad" as a 2, although the numbers are relatable that way. An Account ID of 6 is not twice as "good" as an ID of 3. For ids, you want them to identify, in that they are exclusive and exhaustive - but integer IDs are not supposed to be ordinal (in that their sizes are comparable in relation to each other, 6 > 5 is an invalid statement). Of course, you can retain this interpretation, e.g. an auto increment id would indicate that id 6451 was created WAY later than ID 6, but this is difficult to interpret; because it doesn't really tell you how much later, and also deletion of integers in between a range may be reissued (id 1, 2, 3, 4, delete 3, 3 is a missing rank, 3 could be reissued). So ids aren't ints, because they are exhaustive, but they aren't ordinal, or at least interpreting them ordinally is dangerous, and some operations aren't allowed even if you interpret them to be ordnial; for example, it would not make sense to calculate an arithmetic mean from integer ids, although mathematically this operation is allowed. In statistics, this concern is discussed as the Skalenniveau (German, meaning Level Of Scale), in English it's called level of measurement [1].

In sum, it does not matter what your usecase is, an int is an int, but depending on your use case, it might be worthwhile not to pass an uninterpreted integer around, but actually wrap it in some kind of object, where you localize all your decisions how to interpret the integer.

None of should be derogated as some form modern hipster javascript, where none of us highlevel kids don't know how to bit-bang a set difference from some account ids; but rather, these ideas are really old. Even in C, data abstraction is useful, in that you don't fiddle with integers but define a set of methods, possibly in a module that interact with a hidden internal representation. Deciding on where to cut these modules appears to be a difficult task, but we all have known this for a while now [2].

As I said, nothing wrong with bitbanging, but encapsulating interpretation in types, classes, objects, modules or functions is a useful strategy to reduce the complexity, and you lose all of it when you just pass integers around and cross your fingers and hope the next developer won't calculate a mean from your integer ids.

And finally: The performance argument. Tell me how many requests/ops per second you need and let's find out why your program can't do them. Make it work, make it fast, not the other way around.

I have had too many arguments where people talked about "performance" without stating numbers. I had a colleague who argued that joins were bad, because they were slow (which conceptually, they are), but then your database is a highly optimized processor to do exactly these operations. Their fear of table-joining yielded a database with few tables, each of which had very long columns, each of which contained values separated by two semicolons, which they would then manipulate using string manipulations. Also the table grew, because the lack of normalization caused a lot of redundancy. I have seen many sins in the name of performance, and I will kindly ask about some numbers. Performance without numbers is not a good argument against data abstraction.

Also it can usually be handeled. If your high level Python program becomes too slow for some reason, feel free to implement the slow parts in a really fast language and add a clever algorithm in assembler; see for example np and scipy.

These approaches aren't mutually exclusive. "Please don't do it" and "considered harmful" doesn't get us anywhere.

Anyway, coming back to the topic of the thread: How does code become unmaintainable? By following rules of thumb without thinking on their contextual requirements, and most often by passing around integers in the name of performance, while ignoring locality of decision, locality of code, and ease of comprehension. An int is memory, and account id is an intentional interpretation of this memory (an int also being an interpretation of memory already, I get it, this is about abstraction - and finally, true comprehension comes from world-reference. The int doesn't care whether it is 5. You do, see above.)

RAM and CPU power are less expensive than two developers wasting their time trying to understand some low-level code and identifying which functions expect an int64 and which need a size_t, and whether they are equivalent, and so on, while they could just be passing around some thing with a stable interface, and a localized world-reference (you know, a name).

I wouldn't argue that all of this object/type stuff it is THE way to go, but these treatments were all invented specifically to solve the issues of raw-integers. And certainly I am quite ok with the fact that not everything uses Java; however the ideas we're talking about here are not specific to Java, and Java in particular often provides a very poor version of the story.

I would, however argue, that it is important to be congurent to the unit of expression of your programming language. Treating C as if it were object-oriented will give you a bad time, and not using objects in C# and Java will also give you a bad time. If your language natively provides an optimized iterator pattern, as does Python, coding in the C-for-loop-index idiom will give you a bad time. Most unmaintainable Java comes from not understanding Java, because you mistake it for C without & and *, but with Objects.

Unmaintainable code is about people. Code doesn't maintain itself.

</rant rel="sorry">

[1] https://en.wikipedia.org/wiki/Level_of_measurement

[2] https://dl.acm.org/citation.cfm?id=361623