Hacker News new | ask | show | jobs
by Silhouette 2555 days ago
You don't lose anything by strictly enforcing the encapsulation because you can always explicitly provide low level methods into the data structures.

But then you're not really gaining anything either, unless perhaps you have some mechanism to enforce that the low-level access should only be used in specific circumstances when it is deliberately intended. It's like writing classes that have a few data members, but then writing direct get and set accessors for each of them anyway. There's no more complicated invariant that you're enforcing at that point, and without any non-trivial invariants to be enforced, the whole argument for encapsulation and data hiding becomes moot.

Conversely, you lose all safety when you open up encapsulation. The method API of an object is the contract it provides.

Do you always need that safety, though? If your data is defined to be stored in a certain representation, and direct access to that underlying representation in that format is available if needed, isn't that representation now just another part of the contract? Again, you're balancing two competing priorities: is there some invariant to be enforced that is sufficiently complicated for data hiding to be a useful safeguard, and is it useful to interact efficiently with, or make safe assumptions about performance based on, the true data representation?

Which one is the more important consideration must surely depend on how complicated your representation and any related invariants are. Being able to pattern match against values of some relatively simple algebraic data type can be very useful, for example. On the other hand, it's not much fun to accidentally corrupt the super-efficient look-up structure at the heart of your whole system that now uses a complicated set of hash tables and the odd Bloom filter internally after someone spent three weeks optimising it for a 50% speed boost.

2 comments

You can change the internal representation while doing transformations in the getter to preserve compatibility.

With collections, you can export contents through an Iterable, or to Array, or any number of other strategies, without coupling the consumer to your internal representation.

Of course, but what I am questioning here is how often we really do change internal representations of simple data structures. The theoretical benefit of hiding every representation behind an interface is clear, but any abstraction also has a potential cost if it creates a barrier to doing something useful and/or becomes leaky.

You can still provide standardised interfaces for things like iteration along with a data structure even if you choose to expose the specific representation, so I am not sure how strong an argument your second point makes. Depending on the situation, you may find your consumer is implicitly coupled to the true representation anyway, perhaps because it inadvertently relies on values being iterated in sorted order or insertion order or because it assumes certain performance characteristics even if these things are not strictly part of the documented interface. More than once in programming history, even standard libraries of popular programming languages have been updated to guarantee some behaviour that had been reliable in practice but was never actually part of the original specification.

My perspective is from a heavy Java background.

By using the standardised interfaces for everything you can change the implementation without changing how you work on it.

For example, my service that stores a collection in a database. I could write my service so that it takes in a Collection, rather than an ArrayList, because then anyone using it can pass in a Set (no duplicates), CopyOnWriteArrayList, TreeSet (ordered set).

By hiding internals and using the interface, I can pick the abstraction I need and give more freedom to those using my classes.

Another example. Once Project Valhalla and Value types come in, LinkedList might be changed to use value types for Nodes. Lets say I've tightly coupled my code to the implementation of LinkedList. This could potentially break my code.

And there is nothing wrong with writing generic code like that! Just because you can access the specific representation of the underlying data, that does not mean you have to or would do so routinely. In languages that do tend to expose simple data structures directly, it is still normal to provide standardised tools for accessing and manipulating them, and most of the time that is probably still how you would interact with them.

Regarding your second example, if you are working with an explicit representation then you simply would not make a breaking change like that. Instead you would create a new data structure with the new representation, which other code can then choose to use instead if it wants to. Again, nothing about this prevents both versions from also providing equivalent functions to access them in the same way where that makes sense or writing other code in terms of those functions rather than tied directly to the specific implementation.

I think to some degree its a matter of opinion but I couldn't disagree more.

The argument for abstraction at the procedure level seems obvious if you've ever used an interface or private members.

I think you have some picture in your head where everything has to be abstracted in a needlessly obtuse way to be an object. If you have some DTO, the fields are the contract so its fine if they're public. (In Java in particular its easier to refactor if you use the getter/setter pattern but that's not an OOP issue.)

I mean, I really don't understand what you're advocating here. Everything should be public?

I am simply suggesting that religiously hiding implementation of simple data structures behind formal interfaces may be excessive.

You seem to be assuming a certain style of programming or type of programming language here, along the lines of mainstream OOP languages. There are other options, from low level languages like C to languages where algebraic data types and pattern matching are bread and butter, that do not attempt to conceal true representations of simple data structures in that way. The sky does not fall! :-)

Sometimes it is useful to be able to drop down to that level if the provided set of standard functions to work with that data structure does not meet your needs. You might want to support some new access pattern that is not offered in a convenient form by the tools available in the provided interface, for example.

Sometimes you might want to know, say, which order the axes go in for some multi-dimensional data structure, because writing your algorithms to match can have a profound effect on real world performance. And in this sort of situation, the advantages of being explicit about that can far outweigh the theoretical benefits of being able to replace the internal representation transparently, which is something you are extremely unlikely to be doing in an established system where this kind of issue arises anyway.