Hacker News new | ask | show | jobs
by DemocracyFTW 1634 days ago
The proliferation of field types has made data more difficult to transfer or share data between different applications and generates confusion. ITOP has only two fundamental data types: numeric and character, and perhaps a byte type for conversion purposes. (I have been kicking around ideas for having only one type.) The pre- and post-validators give any special handling needed by the field. A format string can be provided for various items like dates ("99/99/99"), Social-Security-Numbers ("999-99-9999"), and so forth. (Input formats are not shown in our sample DD.) Types like dates and SSN's can be internally represented (stored) just fine with characters or possibly integers. For example, December 31, 1998 could be represented as "19981231". This provides a natural sort order.

This is very nineties and I must disagree. The datetime-as-string example shows it most clearly: wanting to sort by full date is only one thing you want to do with calendar data; often you will want to compare, say, things that happened on Mondays vs things that happened over the weekend, or things that happened within so-and-so many hours around a given point in time and so, not to mention the complexities of DST and timezones. You can do all that with text-based strings but you'd have to write quite a bit of logic that will get applied to strings over and over again, or else you can store the results of parsing a date string into separate fields. Dates expressed as text also don't allow you to validate "19990229" or "20020631" in a very straightforward manner.

I think our collective and by now decades-old experience with duck/weakly-typed languages like Python, JavaScript, Ruby and so on clearly shows that what you gain in simplicity you lose in terms of assured correctness.

1 comments

The way to deal with dates is not by having separate fields. It's by having a single value represent the time (in Linux it's time_t). Every other format gets translated to time_t, all processing is done with time_t, and then the time_t gets translated to the desired output format.

Any other scheme is doomed to working 99% of the time, and that last 1% will be impossible to fix.

> It's by having a single value represent the time

I have limited experience dealing with human originated time references, but from my encounters, the various idiosyncratic forms of date storage often seem to arise out of an aversion to commit to well defined intervals of uncertainty / margins of error. Coercing people of limited mental bandwidth or interest beyond immediate gratification to go through the pain of constricting their mental models to time_t levels of precision seems to basically be a non-starter.

This is of course exactly what I've been meaning to say here. We want more specific datatypes (e.g. a true date(time) ADT) with better functionality (say interval computation) and constraint checking (AKA domains, such as 'let n be an even, positive integer gt zero').