|
|
|
|
|
by paperwork
2114 days ago
|
|
I really like the idea of being more thoughtful about naming columns and being more explicit about the “type” of data contained in them. Is this idea already known among data modelers or data engineers? I’d love to read any other references, if available. |
|
I would say, yes this idea is known/very common, as data architecture is as much about the descriptive language we use as anything. I mean, "business glossaries", taxonomy, even just naming conventions [2] in coding, these are all related.
If you build enough databases/tables or even code yourself, you inevitably come across the "how to name things" problem [3]. If all you have to sort on for the known meaning of a thing (column, table, file, etc.) is a single string value, then encoding meaning into it is quite common. This way, a sort creates a kind of "grouping". Many database vendors follow standard naming conventions - such as Oracle, for example [4]. It is considered a best practice when designing/building the metadata for a large system, to establish a naming convention. Among other things, it makes finding things easier, as well as all the potential for automation.
You get all kinds of variations on this, such as, should the "ID_" come as a prefix or a suffix (i.e. "_ID"). One's initial thought is to use it as a prefix so all the related types group together, but then that becomes much more difficult if you want to sort items by their functional area (e.g. DRIVER_ID, DRIVER_IND, etc.).
One other place you see something similar is in "smart numbers" which is an eternal argument - should I use a "dumb identifier" (GUID, integer) or a "smart one" (one encoding additional meaning) [5].
I mean, basically, any time you can encode information in the meta-data of data, I think you can then operate on it by following "convention over configuration" (as mentioned elsewhere in the discussion comments).
The only problem I see is that such conventions can, at times be limiting - depending on the length of your metadata columns, and the variability you are trying to capture - which is why I believe, generally, metadata is often better separated and linked to the data it describes - this decoupling allows for much more descriptive metadata than one could encode in simple a single string value. Certainly, you can get a long way with an approach like this, but I suspect you would run into 80/20 rule limitations.
Using naming in this way is a form of tight coupling, which could be seen as an anti-pattern in terms of meta-data flexibility, in some cases.
[1] https://en.wikipedia.org/wiki/Metadata_management
[2] https://en.wikipedia.org/wiki/Naming_convention_(programming...
[3] https://martinfowler.com/bliki/TwoHardThings.html
[4] https://oracle-base.com/articles/misc/naming-conventions
[5] https://en.wikipedia.org/wiki/Smart_number