Hacker News new | ask | show | jobs
by tmyklebu 2859 days ago
> What else are you going to do in C? But even most real world implementations of C have all sorts of structs and typedefs that you shouldn’t make any assumptions about and you should treat as an opaque type.

Subtract two times? (BTW, struct tm isn't an opaque type.)

> > Who are you to say I won't do math on account ids?

> Is that really what you are going to argue? That in a real world use case you’re going to be doing math on an accountId?

You seem to think that's absurd? Maybe I want to store a set of accountids. So I sort them (bit-extraction or comparison!), take deltas (subtraction!), and encode the deltas sensibly. Maybe I have two lists of accountids and I want to check that none of the things in the first list show up in the second list. Some sort of hashing scheme (bit-fiddling!) is going to be my friend here.

> > What ORM? What database? If I'm using an ORM and a database, high-performance time operations are likely out of the question.

> So you’re going to pass a raw key/value record set around your code and program like it was 2006? So now you’re going back to mapping your sql resultset to an object that makes sense anyway. Going right back to my point that you’re going to have a mapping between your raw results -> domain model -> view model (where the view model is serialized for external use) and back again anyway.

What SQL? What resultset? What object? I have an integer. Why do you assume it doesn't make sense?

1 comments

You seem to think they hat's absurd? Maybe I want to store a set of accountids. So I sort them (bit-extraction or comparison!),

You’re going to do “bit extraction” to sort? If you treat accountId as an opaque type, you are going to have a comparison operator as part of the type and the rest of your code is going to just use a < or > symbol.

take deltas (subtraction!), and encode the deltas sensibly. Maybe I have two lists of accountids and I want to check that none of the things in the first list show up in the second list. Some sort of hashing scheme (bit-fiddling!) is going to be my friend here.

Or in a modern language since you have already defined equality between the two types and overridden the GetHashCode() function....

var deltas = accountList1.Except(accountlist2)

Why are you doing that all through your code instead of encapsulating the concept of equality, less than and greater than in one place?

What SQL? What resultset? What object? I have an integer. Why do you assume it doesn't make sense?

You were referring to CRUD apps and having to serialize the Account object. Either you are using an ORM, you are getting results from a database as a record set - which is usually represented by a dictionary of key value pairs and mapping it to your Account class or you are sending back a raw record set.

More than likely, you are mapping from your domain model to your externally exposed view model anyway.

The idea of a type / class / object here is equivalent in that they provide locality of interpretation. There is one place that may contain the code that manipulates these values; in other words, there is one place that contains your understanding - your interpretation - of the integer.

You (i.e., all commenters in this thread somehow disputing the AccountId object version of the story) are completely right, it is difficult to decide up front what you need. Of course, depending on your domain, it may be perfeclty valid to have -5 years or 1800, but I am not asking you to never implement years like this, I am merely suggesting that it is useful to have one place where such decisions, invariants, roles - your understanding of a year, perhaps specific to your domain - are located.

How your understanding is structuring is a different concern, because of course, a class with 10000 LOC is horrible to work with. But you can always use means of abstraction, and compose independent modules. An int does everything, maybe in one context a year shoudln't be 1800, maybe formatted as 4 digits. A year class / type / object is the place to store these decisions. As I outlined, such decisions are both technical (how many bits?) but may also be domain specific (year must be > 1800). Maybe you have an int internally, but abstract it with a non-zero int.

It boils down to interpretation. An int allows for many operations, but they may be invalid (in the sense of "implausible", or "undefined"). For example, an account ID of 1 can be multiplied by 2 and then by 2 again, and so on. Integers form a relation this way, but not all extensions of this relation might be "meaningful.

Integers as ids are a great example for this. Technically, they are great, because they are simple numbers, can be typed on a keyboard, and be readily interpreted, but not like integers.

6 people two times as many than 3 people. In Germany, grades are marked 1 - 6, (1 = very good, 6 = very poor). But the relationship is merely ordinal. A grade of 4 is not "twice as bad" as a 2, although the numbers are relatable that way. An Account ID of 6 is not twice as "good" as an ID of 3. For ids, you want them to identify, in that they are exclusive and exhaustive - but integer IDs are not supposed to be ordinal (in that their sizes are comparable in relation to each other, 6 > 5 is an invalid statement). Of course, you can retain this interpretation, e.g. an auto increment id would indicate that id 6451 was created WAY later than ID 6, but this is difficult to interpret; because it doesn't really tell you how much later, and also deletion of integers in between a range may be reissued (id 1, 2, 3, 4, delete 3, 3 is a missing rank, 3 could be reissued). So ids aren't ints, because they are exhaustive, but they aren't ordinal, or at least interpreting them ordinally is dangerous, and some operations aren't allowed even if you interpret them to be ordnial; for example, it would not make sense to calculate an arithmetic mean from integer ids, although mathematically this operation is allowed. In statistics, this concern is discussed as the Skalenniveau (German, meaning Level Of Scale), in English it's called level of measurement [1].

In sum, it does not matter what your usecase is, an int is an int, but depending on your use case, it might be worthwhile not to pass an uninterpreted integer around, but actually wrap it in some kind of object, where you localize all your decisions how to interpret the integer.

None of should be derogated as some form modern hipster javascript, where none of us highlevel kids don't know how to bit-bang a set difference from some account ids; but rather, these ideas are really old. Even in C, data abstraction is useful, in that you don't fiddle with integers but define a set of methods, possibly in a module that interact with a hidden internal representation. Deciding on where to cut these modules appears to be a difficult task, but we all have known this for a while now [2].

As I said, nothing wrong with bitbanging, but encapsulating interpretation in types, classes, objects, modules or functions is a useful strategy to reduce the complexity, and you lose all of it when you just pass integers around and cross your fingers and hope the next developer won't calculate a mean from your integer ids.

And finally: The performance argument. Tell me how many requests/ops per second you need and let's find out why your program can't do them. Make it work, make it fast, not the other way around.

I have had too many arguments where people talked about "performance" without stating numbers. I had a colleague who argued that joins were bad, because they were slow (which conceptually, they are), but then your database is a highly optimized processor to do exactly these operations. Their fear of table-joining yielded a database with few tables, each of which had very long columns, each of which contained values separated by two semicolons, which they would then manipulate using string manipulations. Also the table grew, because the lack of normalization caused a lot of redundancy. I have seen many sins in the name of performance, and I will kindly ask about some numbers. Performance without numbers is not a good argument against data abstraction.

Also it can usually be handeled. If your high level Python program becomes too slow for some reason, feel free to implement the slow parts in a really fast language and add a clever algorithm in assembler; see for example np and scipy.

These approaches aren't mutually exclusive. "Please don't do it" and "considered harmful" doesn't get us anywhere.

Anyway, coming back to the topic of the thread: How does code become unmaintainable? By following rules of thumb without thinking on their contextual requirements, and most often by passing around integers in the name of performance, while ignoring locality of decision, locality of code, and ease of comprehension. An int is memory, and account id is an intentional interpretation of this memory (an int also being an interpretation of memory already, I get it, this is about abstraction - and finally, true comprehension comes from world-reference. The int doesn't care whether it is 5. You do, see above.)

RAM and CPU power are less expensive than two developers wasting their time trying to understand some low-level code and identifying which functions expect an int64 and which need a size_t, and whether they are equivalent, and so on, while they could just be passing around some thing with a stable interface, and a localized world-reference (you know, a name).

I wouldn't argue that all of this object/type stuff it is THE way to go, but these treatments were all invented specifically to solve the issues of raw-integers. And certainly I am quite ok with the fact that not everything uses Java; however the ideas we're talking about here are not specific to Java, and Java in particular often provides a very poor version of the story.

I would, however argue, that it is important to be congurent to the unit of expression of your programming language. Treating C as if it were object-oriented will give you a bad time, and not using objects in C# and Java will also give you a bad time. If your language natively provides an optimized iterator pattern, as does Python, coding in the C-for-loop-index idiom will give you a bad time. Most unmaintainable Java comes from not understanding Java, because you mistake it for C without & and *, but with Objects.

Unmaintainable code is about people. Code doesn't maintain itself.

</rant rel="sorry">

[1] https://en.wikipedia.org/wiki/Level_of_measurement

[2] https://dl.acm.org/citation.cfm?id=361623