Hacker News new | ask | show | jobs
by danmaz74 2555 days ago
In a purely oop language, data structures exist as an implementation detail, but you can never access them directly (as in, bypassing the object interface). That's by design.
2 comments

I get that in your application, you may want to keep a linked list behind its interface 90% of the time. However, considering your system as a whole, at some point you may want to take that linked list data and write it to a database, in which case the cleanest thing is to bypass the interface and extract the "data structure object" so to speak and deal with it in a database-related object, rather than encumbering your LinkedList object with database behaviors.
This is a violation of OOP. Instead, consider methods that produce and consume a serialized representation of the data instead.

Things like Java serialization and Python pickle attempt to do what you say and are considered failures (or at least security risks) because they allow a third party to act on object implementation internals.

Security aside, a denormalized representation of data could be different than the implementation specific representation that's encapsulated inside an object.

But this is partly a self-made problem, because in this OOP model you have decided that the internal representation of your data is to be hidden and therefore the data is only accessible via the provided interface.

In practice, it is debatable how often that is helpful when you're implementing generic data structures. An alternative is to specify the representation explicitly and provide a set of functions designed to work with it, but also to allow direct access by other functions when that is useful.

You can still build a layer of more abstract interfaces on top and write more generic algorithms in terms of those interfaces rather than any specific concrete representation, as for example Haskell's typeclass system does.

>An alternative is to specify the representation explicitly and provide a set of functions designed to work with it, but also to allow direct access by other functions when that is useful.

I don't see that as an alternative. I consider this natural OOP. You don't lose anything by strictly enforcing the encapsulation because you can always explicitly provide low level methods into the data structures. Conversely, you lose all safety when you open up encapsulation. The method API of an object is the contract it provides. If you go around that contract it's much harder to make safe implementation changes.

Because of this, low level access should be opt in, in the way you describe.

You don't lose anything by strictly enforcing the encapsulation because you can always explicitly provide low level methods into the data structures.

But then you're not really gaining anything either, unless perhaps you have some mechanism to enforce that the low-level access should only be used in specific circumstances when it is deliberately intended. It's like writing classes that have a few data members, but then writing direct get and set accessors for each of them anyway. There's no more complicated invariant that you're enforcing at that point, and without any non-trivial invariants to be enforced, the whole argument for encapsulation and data hiding becomes moot.

Conversely, you lose all safety when you open up encapsulation. The method API of an object is the contract it provides.

Do you always need that safety, though? If your data is defined to be stored in a certain representation, and direct access to that underlying representation in that format is available if needed, isn't that representation now just another part of the contract? Again, you're balancing two competing priorities: is there some invariant to be enforced that is sufficiently complicated for data hiding to be a useful safeguard, and is it useful to interact efficiently with, or make safe assumptions about performance based on, the true data representation?

Which one is the more important consideration must surely depend on how complicated your representation and any related invariants are. Being able to pattern match against values of some relatively simple algebraic data type can be very useful, for example. On the other hand, it's not much fun to accidentally corrupt the super-efficient look-up structure at the heart of your whole system that now uses a complicated set of hash tables and the odd Bloom filter internally after someone spent three weeks optimising it for a 50% speed boost.

You can change the internal representation while doing transformations in the getter to preserve compatibility.

With collections, you can export contents through an Iterable, or to Array, or any number of other strategies, without coupling the consumer to your internal representation.

I think to some degree its a matter of opinion but I couldn't disagree more.

The argument for abstraction at the procedure level seems obvious if you've ever used an interface or private members.

I think you have some picture in your head where everything has to be abstracted in a needlessly obtuse way to be an object. If you have some DTO, the fields are the contract so its fine if they're public. (In Java in particular its easier to refactor if you use the getter/setter pattern but that's not an OOP issue.)

I mean, I really don't understand what you're advocating here. Everything should be public?

> ... are considered failures (or at least security risks) because they allow a third party to act on object implementation internals.

Well, not really, in Java you had to opt in, and you could implement the readObject and writeObject methods to override the default behavior; similarly in Python, you can override __reduce__ and friends as needed.

The security problems are more these protocols are essentially executing arbitrary code when reading data. You'd think, "I can specify a root object and then it's going to specify what its attributes can be, and so forth."

But Python is a dynamically typed language so give a class Foo, its attributes can be anything. (Though now with annotations, it is possible to lock down precisely the types you allow; I wrote a module that does this.[1])

Java could have fixed this if they didn't go with type erasure. `List<Foo>` becomes an `List` at runtime, so there's no way to determine what it ought to contain, short of kludges[2] that other libraries use.

> Security aside, a denormalized representation of data could be different than the implementation specific representation that's encapsulated inside an object.

Yup, as soon as your process has to interact with other processes, even writing to a file and reading it later, it's possible to disagree on how data is represented or what it means. I'll plug another project of mine: I try to make the client smart[3] (enough) to allow versioning[4], as well as avoiding a class hierarchy.

[1]: https://pypi.org/project/json-syntax/

[2]: https://static.javadoc.io/com.google.code.gson/gson/2.8.5/co...

[3]: https://tenet-lang.org/contrast.html

[4]: https://tenet-lang.org/versioning.html

> This is a violation of OOP.

and this is because of that that OOP has a bad name.

It's ok to violate "great principles" when writing software. What I have never seen pan out is dogmatic application of principles.

> consider methods that produce and consume a serialized representation of the data instead

That sounds appallingly inefficient for no design win. This is precisely the problem solved by Java's Iterator interface, and .Net's IEnumerable, isn't it? I can't imagine why we'd want to serialise and deserialise. Or am I misreading things?

It's one of the most efficient ways.

But it's not the cleanest way by any measure. It has tighter internal coupling than any other easy to imagine way. Most of the times, people will use the list normal interface to take the data.

If performance is a requirement, most OOP people will create a database interface, that the linked list class can write without taking outside of its interface. Most FP people would do it the other way around, and create an interface to applying data to a function that the list can implement. Either way, it can tie with the dirty introspection in performance.

> in which case the cleanest thing is to bypass the interface and extract the "data structure object"

That doesn't seem like the cleanest to me. Instead, you'd use the interface to read/write to the list as you write/read it to storage. At no time do you want to reach into the LinkedList object and access it's internal mechanism.

I have these kinds of arguments with my C# co-workers all the time, and they usually have some annoyingly reasonable solution like "create a database serialization class with a linked list consumer". :)
This reminded me of design antipattern “Public Morozov”, named after Soviet pioneer Pavlik Morozov, who denounced his father to the secret police. In this design antipattern implementation details are exposed via convenient public methods.
In a purely oop language, data structures exist as an implementation detail, but you can never access them directly

In Smalltalk, you can have a TreeNode class and instantiate TreeNode instances. This is often the way that binary trees are implemented in Smalltalk. In that case, data structures exist as a design of interacting objects. In that case, you can molest them fairly directly using the object interface.

The same goes for Java and C#. Basically, you can design in such a way, that you can essentially have C or Fortran in any language, even a pure OO language like Smalltalk. The right way to do this in Smalltalk, is to use a Facade, which can be used to hide the guts of your data structure.