Hacker News new | ask | show | jobs
by hudon 2555 days ago
The claim that "an object is a set of functions that operate on implied data elements" has a strange corollary because of how in modern OO languages like Java or C#, there is no syntax or popular naming convention to tell the difference between a data structure and an Object. For example, in Java, a LinkedList object is actually not a linked list, it is a set of functions that operate on an implied linked list. If the system needed direct access to the data for whatever reason, we'd need to explicitly have a LinkedList data structure object that only contained the data (the values and their pointers), as well as a second class, the LinkedListOperator, that contains all the functions (add, first, etc.). Likewise, in the author's examples, there'd be a Square class and a SquareOperator class.

I was going to say that Haskell addresses this by putting values in data types and behaviors in "type classes", but then I remembered that functions can be values... which is now making me think that the reality is probably more abstract or complex than the author here is letting on.

5 comments

Haskell does address it exactly as you say. Functions and values both being expressions is much less complex. As an example, imagine if you couldn't freely substitute 2+2 and 4. Functional languages say those are the same, imperative languages say one is a value and one is a function call.

It's not particularly intuitive if you're not used to high level math or functional programming, but it really is a lot simpler (not that it doesn't have downsides/leaks in the abstraction, but that's another discussion).

I'm not sure that functions alone are sufficient. That's why Haskell supports existential types, and there isn't a convention for distinguishing them so the critique applies.
Existential types are types of values. There is no distinction.
Existential types are types of values.
In a purely oop language, data structures exist as an implementation detail, but you can never access them directly (as in, bypassing the object interface). That's by design.
I get that in your application, you may want to keep a linked list behind its interface 90% of the time. However, considering your system as a whole, at some point you may want to take that linked list data and write it to a database, in which case the cleanest thing is to bypass the interface and extract the "data structure object" so to speak and deal with it in a database-related object, rather than encumbering your LinkedList object with database behaviors.
This is a violation of OOP. Instead, consider methods that produce and consume a serialized representation of the data instead.

Things like Java serialization and Python pickle attempt to do what you say and are considered failures (or at least security risks) because they allow a third party to act on object implementation internals.

Security aside, a denormalized representation of data could be different than the implementation specific representation that's encapsulated inside an object.

But this is partly a self-made problem, because in this OOP model you have decided that the internal representation of your data is to be hidden and therefore the data is only accessible via the provided interface.

In practice, it is debatable how often that is helpful when you're implementing generic data structures. An alternative is to specify the representation explicitly and provide a set of functions designed to work with it, but also to allow direct access by other functions when that is useful.

You can still build a layer of more abstract interfaces on top and write more generic algorithms in terms of those interfaces rather than any specific concrete representation, as for example Haskell's typeclass system does.

>An alternative is to specify the representation explicitly and provide a set of functions designed to work with it, but also to allow direct access by other functions when that is useful.

I don't see that as an alternative. I consider this natural OOP. You don't lose anything by strictly enforcing the encapsulation because you can always explicitly provide low level methods into the data structures. Conversely, you lose all safety when you open up encapsulation. The method API of an object is the contract it provides. If you go around that contract it's much harder to make safe implementation changes.

Because of this, low level access should be opt in, in the way you describe.

You don't lose anything by strictly enforcing the encapsulation because you can always explicitly provide low level methods into the data structures.

But then you're not really gaining anything either, unless perhaps you have some mechanism to enforce that the low-level access should only be used in specific circumstances when it is deliberately intended. It's like writing classes that have a few data members, but then writing direct get and set accessors for each of them anyway. There's no more complicated invariant that you're enforcing at that point, and without any non-trivial invariants to be enforced, the whole argument for encapsulation and data hiding becomes moot.

Conversely, you lose all safety when you open up encapsulation. The method API of an object is the contract it provides.

Do you always need that safety, though? If your data is defined to be stored in a certain representation, and direct access to that underlying representation in that format is available if needed, isn't that representation now just another part of the contract? Again, you're balancing two competing priorities: is there some invariant to be enforced that is sufficiently complicated for data hiding to be a useful safeguard, and is it useful to interact efficiently with, or make safe assumptions about performance based on, the true data representation?

Which one is the more important consideration must surely depend on how complicated your representation and any related invariants are. Being able to pattern match against values of some relatively simple algebraic data type can be very useful, for example. On the other hand, it's not much fun to accidentally corrupt the super-efficient look-up structure at the heart of your whole system that now uses a complicated set of hash tables and the odd Bloom filter internally after someone spent three weeks optimising it for a 50% speed boost.

> ... are considered failures (or at least security risks) because they allow a third party to act on object implementation internals.

Well, not really, in Java you had to opt in, and you could implement the readObject and writeObject methods to override the default behavior; similarly in Python, you can override __reduce__ and friends as needed.

The security problems are more these protocols are essentially executing arbitrary code when reading data. You'd think, "I can specify a root object and then it's going to specify what its attributes can be, and so forth."

But Python is a dynamically typed language so give a class Foo, its attributes can be anything. (Though now with annotations, it is possible to lock down precisely the types you allow; I wrote a module that does this.[1])

Java could have fixed this if they didn't go with type erasure. `List<Foo>` becomes an `List` at runtime, so there's no way to determine what it ought to contain, short of kludges[2] that other libraries use.

> Security aside, a denormalized representation of data could be different than the implementation specific representation that's encapsulated inside an object.

Yup, as soon as your process has to interact with other processes, even writing to a file and reading it later, it's possible to disagree on how data is represented or what it means. I'll plug another project of mine: I try to make the client smart[3] (enough) to allow versioning[4], as well as avoiding a class hierarchy.

[1]: https://pypi.org/project/json-syntax/

[2]: https://static.javadoc.io/com.google.code.gson/gson/2.8.5/co...

[3]: https://tenet-lang.org/contrast.html

[4]: https://tenet-lang.org/versioning.html

> This is a violation of OOP.

and this is because of that that OOP has a bad name.

It's ok to violate "great principles" when writing software. What I have never seen pan out is dogmatic application of principles.

> consider methods that produce and consume a serialized representation of the data instead

That sounds appallingly inefficient for no design win. This is precisely the problem solved by Java's Iterator interface, and .Net's IEnumerable, isn't it? I can't imagine why we'd want to serialise and deserialise. Or am I misreading things?

It's one of the most efficient ways.

But it's not the cleanest way by any measure. It has tighter internal coupling than any other easy to imagine way. Most of the times, people will use the list normal interface to take the data.

If performance is a requirement, most OOP people will create a database interface, that the linked list class can write without taking outside of its interface. Most FP people would do it the other way around, and create an interface to applying data to a function that the list can implement. Either way, it can tie with the dirty introspection in performance.

> in which case the cleanest thing is to bypass the interface and extract the "data structure object"

That doesn't seem like the cleanest to me. Instead, you'd use the interface to read/write to the list as you write/read it to storage. At no time do you want to reach into the LinkedList object and access it's internal mechanism.

I have these kinds of arguments with my C# co-workers all the time, and they usually have some annoyingly reasonable solution like "create a database serialization class with a linked list consumer". :)
This reminded me of design antipattern “Public Morozov”, named after Soviet pioneer Pavlik Morozov, who denounced his father to the secret police. In this design antipattern implementation details are exposed via convenient public methods.
In a purely oop language, data structures exist as an implementation detail, but you can never access them directly

In Smalltalk, you can have a TreeNode class and instantiate TreeNode instances. This is often the way that binary trees are implemented in Smalltalk. In that case, data structures exist as a design of interacting objects. In that case, you can molest them fairly directly using the object interface.

The same goes for Java and C#. Basically, you can design in such a way, that you can essentially have C or Fortran in any language, even a pure OO language like Smalltalk. The right way to do this in Smalltalk, is to use a Facade, which can be used to hide the guts of your data structure.

In Haskell this can be pushed further, by making existential types, i.e. opaque type that's only known to implement some type class. This doesn't seem to be that useful in Haskell (I think it's because of the immutability), but trait objects in Rust are pretty much the same and are used more often.

As a side note, I hope that the situation with pre- and postcondition checking improves (runtime verficiation like in Racket seems too heavyweight for me, dependent types alone are kind of clunky, type refinements are either bolted onto existing languages (LiquidHaskell, F7) and don't always fit, or exist in very obscure ones (F*)), and we can drop information hiding for the sake of safety alone. I don't like how today the "safe" option is to severly limit what you can do with a data type (want to safely index an array? just use an iterator! oh, you wanted to index two arrays at once? just zip them and hope the compiler makes the tuples go away! you wanted O(1) indexing? too bad, no can do).

> This doesn't seem to be that useful in Haskell (I think it's because of the immutability)

I suspect it's because of the polymorphism. Why write a function that operates on some unknown a for which a Foo implementation exists when it's so easy to write a function that operates on any a for which a Foo implementation exists?

The only time I've seen existentials used is for safety - in particular ST-style monads where a fancy type is used to ensure that you can't "leak" state out of the monadic context.

It indeed makes no sense for passing arguments, but it does for storing things. For example, both `[Int]` and `[Float]` can be passed to a function of type `Num a => [a] -> a`, but these are distinct types, and neither can store a mix of ints and floats.

It ultimately boils down to the difference between static and dynamic polymorphism (Even more visible in Rust, where regular polymorphism works exactly like C++ templates, while trait objects pack a "v-table" together with a structure. In Haskell it's a little more blurry since "static" polymorphism is already implemented in a way that doesn't easily translate to templates, for example allowing polymorphic recursion).

Optparse-applicative has a very nice use of existential types to break a definition loop.
> ...because of how in modern OO languages like Java or C#, there is no syntax or popular naming convention to tell the difference between a data structure and an Object.

In C#, you create a class when it's an object, and a struct when it's a data structure, or am I missing something?

ie:

  // Data Structure
  public struct Foo
  {
      public string Bar { get; set; }
  }
    
  // Object
  public class Bar
  {
      public string Foo { get; set; }
  }
You can check to see if something is a Data Structure like so:

  typeof(Foo).IsValueType (true)
  typeof(Bar).IsValueType (false)
> In C#, you create a class when it's an object, and a struct when it's a data structure, or am I missing something?

Yes and no. Due to the technical limitation that (unlike in C++[0]) structs in C# cannot derive from other structs, DTOs in C# are often implemented (or rather: generated) as classes to allow derivation from a base class that has certain behavioral hooks (say, for misc. custom serializers).

[0] See https://www.fluentcpp.com/2017/06/13/the-real-difference-bet... (the real technical difference between struct and class in C++ is just that the default visibility modifier for a struct is public and the default visibility modifier for a class is private; both can use inheritance and have virtual methods)

Cool thanks
*DTO is the closest to such a convention as its implied that data representation is the goal.