Hacker News new | ask | show | jobs
by barrkel 3280 days ago
This design puts entities in the middle - everything else pivots around entities. Additionally, it uses the OO approach of implementing logic as methods on the entities.

After a couple of decades of experience, I've come to the conclusion that this isn't right. Most business rules involve processes, which are inherently procedural, or, from another perspective, functional - functions of the whole state of the system to a new state of the system. Most processes don't logically belong on an object, and when you stick them to one end, you create lots of problems for yourself.

Just take that example Student entity class; what stops you from writing:

    student.RegisteredCourses.Add(course);
Moreover, when you have a relation between two entities, it's not uncommon for the relation to be modelled at both ends; that is, a student has a list of courses, and a course has a list of students. Do you implement the same validation at both ends? Make one end (choose one) read-only? Do edits done on one end automatically turn up on the other end? How do you protect the visibility of methods that keep either end in sync, without exposing invariant-violating APIs to other code?

Invariants and validations for fields are trivial with an OO approach; for entities, they're reasonably easy, but sometimes you need partially invalid versions while editing or constructing an entity; but once you bring in multiple entities, and relations, everything starts getting hairy pretty quickly. Assertions and validations that would be trivial to write [1] in a relational language like SQL aren't possible in an object-oriented language without a lot of system-building.

So I've come around to the idea that the database is a better thing to put at the centre; that encapsulation and hiding of the main fact store is harmful to the architecture of a system, especially in a heterogeneous environment where your entities are represented in different languages, all backed by the same fact store.

[1] Trivial to write, but not necessarily cheap to evaluate. I'm not advocating writing all global validations in SQL.

8 comments

This is solved by DDD and aggregates. Here's how i would do it.

   Course - Aggregate Root

   Student - Aggregate Root

       RegisteredCourses[RegisteredCourse[]) - Sub Entity Collection - with id reference to course. With metadata like date time when the registration occurred.

       RegisterForCouse Method
There should be no mutable public properties, internal methods should be private. Everything should go through an aggregate root method.
CourseRegistration.Register(course, student);

CourseRegistration.ViewCoursesFor(student);

I think giving either entity "ownership" of the relation is a disaster waiting to happen. I just have this vision of someone wanting data about students accidentally loading all the course information. In the DAL of course you end up modelling however but seeing courses in a student just makes me ill.

In this case it isn't that bad since its only loading registered courses for that specific student. Which for a student would be limited(10?). In fact it should probably be a business rule inside RegisterForCourse that a student can only register for a specific amount of courses per semester.

You normally need load an entire aggregate, so that you can maintain your invariants. If you separate into 3 aggregates such as Course, Student, CourseRegisteration then starts to become harder to maintain business rules. For example if you wanted the RegisterForCourse method to limit student course registration to 10, it would now have perform another query.

If start doing queries that don't match up your domain model, then should probably do CQRS. Which would allow you to build a model optimized for queries.

I just think its bad because its asking the student to be involved in the process of registration. I don't think that way of thinking scales. Unless you're making a util (which enables any architectural practice tbh) you're gonna eventually get stung if you pollute your currency with logic. Currencies just like graphical interfaces should be as dumb as possible.

One day someone is going to want slightly different rules for student course registration and then they'll realise this rule is baked into the core and that's sad.

"One day someone is going to want slightly different rules for student course registration and then they'll realise this rule is baked into the core and that's sad."

This is a desirable trait, and one of the main points of DDD. If you want to change registration, you change the core business logic. It then applies to all applications using that business logic. If you have a real business case for different registration methods, then you simply model that on your entities.

If you don't do this, your going to get business rules applied inconsistently as programmers will interpret requirements slightly differently.

Its already been deployed and some customers are relying on the old behaviour. Old clients might need to call new server code. If you've baked your logic into the objects you pass around you've done a stupid as the client definition could give a different answer than the server definition. ENJOY YOUR "CONSISTENCY"!

So what's easier to change? Giving customers versions that give them all different types of the core base type Student or JUST changing the type of CourseRegistration that they visit? (CourseRegistration is a service as opposed to a currency).

You keep your currency CLEAN.

How did you decide that the registration is a property of the student and not the course?
Registration is definitely not a property of the student.

It's always a good idea in these situations to appeal to real life. The actual business will point the way of the business simulation. In the real world we don't ask students to register for a course. Instead students ask the Registrar to register for a course. When the Registrar makes her decision she considers far more than just the internal state of the Student; she considers (1) does the course have any available seats? (2) has the Student met all the course pre-requisites and, most importantly, (3) has the student paid his tuition and is he even a valid member of the university community.

What the Student does have is a history and a context -- that is, a state -- that must be considered when registering for courses. The student may also have preferences -- courses he wants to register for.

The language of the business should guide these decisions always. A student submits a request tfor a course and it is the office of that accepts or denies this request.

This guy has been thinking about this properly.
Take this thinking to the end and realize that it leads to freestanding functions. In general all the context of the program is needed to execute a functionality. It's not like the registration office owns all the students. It's not like a registration wouldn't change the student's context. Students are both an independent and related concept. The proper object to call most things on is a "Global" object. Now instead of

    Global.do_some_thing(foo, bar)
just

    do_some_thing(foo, bar)
There you have it. OO is an unsound approach which survived so long mainly due to the perceived real world "analogy" and because the Object-Verb-Predicate syntax simplifies code completion.
I like the simplicity of this. If the app is of appreciable size, do_some_thing depends on databases, webservers, external processes, filesystems and configuration. How do you test/debug/explore its functionality without setting all of that up?
I believe OO is good for exactly two things: abstract data containers and state machines.

In the former, access hiding cleans up the API and prevents unsafe usages of the container. In the latter, OO enforces a protocol to keep the state machine sealed off and only aware of key inputs and outputs.

And that's it. I've found nothing else. Data itself is better off when strategized to fit in a database, whether off-the-shelf or a custom-tuned, in-memory design. The state machines may need to query a part or all of the database, as well, so their ability to restrict scope only goes so far.

You nailed why I think many object oriented designs fall flat. People presuppose both that the objects within their domain encompass all objects (just students and courses) as well as that the objects within a domain will not change.

When #1 is missed I usually see that theres a design that doesn't mimic its domain and thus lose the ability for developers and users to have clear, concise communication. At that point OO is a disservice.

When #2 is missed you end up with IFilteredCourseAdapterProcessor as people attempt to bolt on components to solve future needs.

The addition of the "Registrar" to the domain immediately demonstrates how the naive interpretation is missing core components and I bet the users and devs fundamentally aren't speaking the same language.

This, imo, leads to conversations like, "Of course so and so approves all the registrations! Otherwise it would be madness!"

I've come to avoid OO and use only freestanding functions where possible mostly because of this problem. So often it ends up in a syntactic distinction that is absolutely meaningless otherwise.

I use some OO to make abstract datatypes in languages that are inherently OO, but I think the explicit virtual table approach in C or the type classes approach in Haskell are much cleaner.

Lately I had to make a REST API which is basically a distributed Object-oriented interface. I think I managed to get it done with compromises, but I'm not happy. Another idea would be to make a procedural interface first and then make a REST API on top if needed. But I have some doubts it can work out practically.

Some would call your procedural interfaces services and map your REST API directly to publicly available service methods.
Ah, REST.

An API is there because someone in the other end wants to do something. If you don't design it but just slap REST on top of your data, then you're not doing that person a favour.

Well you can still "design REST" on top, right? But it could mean some duplicated efforts.

Actually, after having tried a few times, I'm pretty sure I don't want to put REST ideas at the center of my architecture. You just happen to need a network transport, and not even in all cases (debugging for example). And an existing model is never going to be able to represent all the concepts of your application domain. This means that you need to build your own representations.

In theory there is a point of using a standardized object protocol which can represent some CRUD use cases. But from my limited experience I think it breaks pretty quickly, to the point where I can use only GET and POST.

Using protocol layers of increasing specificity may make it easier to use existing tooling with data exchanges. For example, many APIs use HTTP statuscodes as a more coarse-grained version of the return codes in the body (in an ad-hoc format). Also caching is often brought up as an example where you want to buy in to a standardized protocol.

But some APIs don't buy in, like Facebook, which reportedly returns 200 OK always. It seems like a lot of best-effort work with little returns to me as well. But I don't know - I'm not a professional business software guy.

This is actually a part of understanding the specific domain. How do the educational practitioners think about the problem for application your working on? Is the application student or course centric? This is big part of DDD, and involves working domain experts and getting inside each others heads.

You could defiantly model it as the Register method on the course. But this example shows how you can avoid people messing around with internals once you've modeled it. There is only one way of registering for a course.

I wonder how to even formulate the question to the practitioners -- it seems like an artificial choice imposed by the computer formalism (single-dispatch OO).

In other words, I don't see how it is a domain question, and it seems likely that the domain practitioners just think "there are courses, and there are students, and students can be registered for courses".

My suspicion is that DDD would basically be better without the focus on single dispatch OO.

See my comment below. The language of the business will almost always guide the design in the right direction. There are rare cases where the business is unaware of a more "essential truth." This isn't about OO it's about faithfully capturing the model of the domain which is all DDD is.
"students can be registered for course" - This is already implying an order.
If I say "students can be registered for courses in the school" then you might think the OO model should be

    school.register(student, course)
which does actually seem reasonable to me. Here the school object would basically be the system's logic as a whole, and both the course and the student could just be IDs.

I really think single dispatch OO is a huge distraction and I haven't seen any strong arguments for its use as a general paradigm.

That's wishful thinking.
I know there are ways to solve the problem; but I believe these are accidental complexities, not essential complexities. That is, these design solutions complect the problem, sacrificing simplicity for a dubious principle.
Actually I think this is a major advantage. They are not complexities, they are putting down into code how you think about your application in your head.

If your design is some data, which you can mutate however you want to get the job done, you don't really have model of the application.

Your business rules can become inconsistent because different developers are implementing different but similar methods in your business logic layer all applying slightly different rules. If the entities themselves enforce these rules, it become a lot less likely.

If you are subscribing to the DDD snake oil, you usually weasel out of this decision by claiming that it belongs to the student in one Bounded Contextâ„¢ and to the course in another.
Thank you for mention this. I was hoping that someone mention this. Usually DDD solves this issues because of the rules in the construction of the Domain Models. I usually identify entity != domain models... Entity is the representation of the storage units and domain models goes beyond that by including domain rules....
I've come to the exact same conclusion after spending time with these same types of systems.

Practically speaking, I find that abstracting a system into a set of commands and a set of queries (i.e. CQRS) is often the "right" solution. Each command/query encapsulates an individual use-case, and all of the business rules and database access required.

CQRS and entity central solutions normally go together like peas in pod.

Command side has a domain model, which would be your business objects in this case. A command simply loads the entity, performs a method on it, saves it.

Query has a query model which designed for fast reads, normally built by events emitted by the domain model.

I agree. After 20 years of experience I realize "Clean Architecture" will break down on larger systems. It's easy to create a clean looking architecture with a limited set of use cases. You can do it using any design methodology.

Where systems breakdown is when other viewpoints get added. For this system it might be the following:

  - Pricing/Invoicing

  - Prerequisites

  - Academic Status

  - Professor assignment

  - Course Reviews / Social Media
All of these viewpoints are close enough to the registration system that there is a bias towards reusing as much existing code as possible.

The problem is that at a high level two viewpoints look 95% similar, but in the code it equates to creating interdependencies between all of the viewpoints.

Different teams might be successful in keeping the separation in the system, but in most systems I've seen the entanglement starts in the database with the entities. Columns that become nothing more than status flags for different viewpoints. Columns with near identical names that mean almost the same thing, but are handled differently because the viewpoints treat them differently.

When the system gets big enough, a developer cannot mentally map the whole thing. When implementing a feature they will look for what is available vs what the architect had in mind.

This is how you wind up with 10 different getCourse calls all of which are building off one another with various parameters. The code will have a lot of if/thens checking the parameters to make it work for a particular viewpoint and avoids bugs for the others.

The more separation you have between the viewpoints the better. I now prefer separate entities/databases for every viewpoint. There is a set of entities common to all, but these are fact level entities. A course, A student, A professor, An admin, A TA. The entities should contain no status.

Where the viewpoints need information from each other they should just query the appropriate viewpoint, or have the source viewpoint send out updates (great place for event sourcing).

It might sound like I'm describing micro services. I wouldn't argue that, but I would say that a viewpoint is a higher level concept than a service. A viewpoint could be a collection of micro-services, or a single system (using this Clean Architecture).

I think a big part of that is this mistaken idea that, if you duplicate a single line of code, you've done a terrible, terrible thing.

Sandi Metz did a pretty good talk about this, basically saying we need to think more before we abstract things away because "code duplication". https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstracti...

I think what you call viewpoints are called Bounded Contexts (https://martinfowler.com/bliki/BoundedContext.html) in DDD parlance.
>I've come around to the idea that the database is a better thing to put at the centre

I've seen systems that take this to the extreme of not allowing a single piece of logic into the domain. Domain objects were essentially data containers only. So, if you had a Person object with firstName and lastName properties that represented those DB columns, then even a getFullName() that concatenated the two was verboten.

Instead, all logic had to be in a service. This led to lots of duplication and a super-massive service layer in a system that was decidedly procedural, even if it was implemented in an OO-language.

> So I've come around to the idea that the database is a better thing to put at the centre

In my admittedly limited experience, applications come and go, but data lasts forever. So I find it unusual when a database is not at the heart of business software.

Data lives forever. Schemas live short lives.
if only you've gone one step further, and discuss the (imho) cleanest solution - a event store and a query mechanism (aka, event sourcing, and CQRS https://martinfowler.com/bliki/CQRS.html).

When you want to ask questions about the state of the app, you make queries. These queries could be sql - in which case, your app is directly dependent on the type of storage, and is almost un-abstractable. it could be a custom/bespoke query language (where a set of hardcoded api calls to the Db/datastore counts as an api).

The event source system is responsible only for storing facts. Therefore, the "problem" of where the validation of students against courses doesn't exist, because that relationship is a "fact" in the event store (there must've been a registration happening at some point for this fact to exist). Therefore, a programmer _cannot_ make the mistake of accidentally adding a course to a student who didn't register, unless they maliciously do it.

Most CQRS/Event store systems, simply just build up OO style domain objects by applying events to them.

Then apply a command to the domain object, then save the emitted events.

They are still entity centric.

    I've come around to the idea that the database
    is a better thing to put at the centre
Same for me.

    I'm not advocating writing all global validations
    in SQL.
What do you mean? Can you give an example?
I have also concluded that a lot of business processes can't easily be modeled by objects. There are almost always a bunch of exceptions that need to change change data directly .