Hacker News new | ask | show | jobs
A Pure Types-Driven real-world apps development approach (medium.com)
49 points by jhoxray 1741 days ago
4 comments

We've built a proper data-driven business application - all the logic is effectively SQL over tables.

The challenge has nothing to do with the particular language or type system. Having a strongly-typed language helps a ton with refactor, but its not a prerequisite for data-driven to work correctly.

The most important part of doing this right is stepping away from the computer and listening to how the business refers to the things and their relationships. Getting the schema/normalization correct is foundational to long term success.

For instance, understanding that circular dependencies exist in the real world, and then actually being able to model them accurately is super important. If you have a Customer & Account that need to talk to each other both ways, invent a 3rd (or more) type(s) to relate them together. Model the nature of the relationship itself inside this new type. Very rarely is the relationship between 2 business types just an ID map (e.g. when was the relationship established? By whom?). Also, absolutely don't do any sort of bullshit where you arbitrarily decide which type "wins" in a hierarchy, unless the business has expressly informed you that one of those types is king always.

Closely related to this is nesting of complex types. Just don't. Make arrays (tables) of things and map between them via relations. You can query into JSON blobs with most database engines (we use SQLite), but it's a shitty way to author your BL in my experience.

This, completely.

Data modelling is the skill that is missing the most throughout the process. As you say that's developing a business domain model and mapping that to an appropriate persistence model. Both layers are important and quite often non trivial.

Working with a BA or technical product owner can help a lot in business domain and if you can use as much immutability as possible with a relational model you'll be in a reasonable position for quality development.

Agile killed proper data modeling.

Tackling complexity at the type level is tackling it when it's too late. Same goes for TDD.

It's like appliying a thousand band aids to a shotgun wound.

What? No, careless devs and businesses "killed" data modeling. People care as much about their data structures now as they used to before agile, which means they cared "not a lot".

Agile would push more focus on proper data modeling, as agile promotes changing requirements, devs and stakeholders working together closely and maintaining a constant pace, all things that gets better by proper data modeling and they also help you focus on data modeling.

But then what the Agile Manifesto said and what "Agile/Scrum folks" actually promoted tends to be too different things.

What usually happens now is researching and understanding the business domain is rushed, flawed assumptions are then mapped to a flaky, misaligned database model/prototype and then a mountain of code is furiously piled on top of that.

When you finally understand the domain, it's too late. Because you rushed to deliver "something".

Nobody dares to rework the DB model as it's too risky and expensive.

So you remedy it with transformer layers, refactorings, abstractions, advanced types, tools and tests.

You treat the symptoms.

Teams that aligned their DB model with their business domain early on have it easier in every possible way. Performance, correctness, stability, velocity.

Also, I can't remember the last time I saw data modeling mentioned in a job post. And I've seen thousands of them. From SWE to CTO roles.

My guess: business analysts, data modeling, diagrams, long research cycles sounded too waterfall-y so it had to be killed by agile.

I will say this about agile - we didn't get this right the first time. If we didn't plan for a little bit of failure and iteration, we wouldn't have found this solution.
>Same goes for TDD.

How so?

Please check my answer to "capableweb" user in this thread.
> Getting the schema/normalization correct is foundational to long term success.

Great points; I think things get werider when we try to blur the line between schema and type (conceptually)

I wonder what it would look like to have a nice single abstraction that enjoys the practicality (BL-wise) of schema and the pragmatism (programming-wise) of types

Or maybe we can demonstrate that such isomorphism can't exist?

> I wonder what it would look like to have a nice single abstraction that enjoys the practicality (BL-wise) of schema and the pragmatism (programming-wise) of types

This already exists in my view. With enough discipline, you can accurately encode any type system into the relational model itself. Something to settle up front is the notion of basic types vs domain types. Basic types are required anywhere and you just need to target the simplest subset that is guaranteed to work everywhere - string, int, guid, date, bool, etc.

Domain types are the most important part of solving complex problems. If it would help, this is an example of one of my schemas. Each line item below is a table:

  /* Basic supporting types */
  String (yes, all strings in the domain live in 1 table/column)
  Date (same for date[time]s)
  /* Everything below this line is a pure domain type */
  Customer 
  Customer_CellPhone 
  Customer_Name
  CellPhone_PhoneNumber
  CellPhone_WirelessCarrier
  WirelessCarrier
  WirelessCarrier_Name
  PhoneNumber
  PhoneNumber_String
  PhoneNumber_LastModified
  LastModified
  LastModified_Date
  LastModified_User
  User
  User_Name
  Name
  Name_First
  Name_First_String
  Name_First_LastModified
  Name_Middle_String
  Name_Middle_LastModified
  Name_Last_String
  Name_Last_LastModified
  Name_Business_String
  Name_Business_LastModified
  
The above encodes a fairly detailed view of the domain in my opinion. If you look carefully, you will see that Null is not possible here and that complex data types are strictly enforced.
I do this too in some apps. Very near exactly how you show it.

How do you enforce required fields in the data model? By eliminating the possibility of null values, you eliminate the possibility of not-null constraints in the data model. I end up having to enforce these types of (non-complex) constraints outside the relational model. A missing row is not detectable in the same way as a null column value.

This is really interesting; if we hook this up with hasura we can still get some nice graphql endpoints
I can imagine this might end up looking like a triple store fairly quickly. Which is fine if total flexibility is what you want but just moves the modelling problem upwards.

Did (or Do) you have any heuristics, or rules of thumb, for deciding how much to break relationships down into separate entities? I could see getting that wrong could lead to tricky db migrations.

Good comment, though one minor semantic nitpick:

> Make arrays (tables) of things and map between them via relations.

In relational algebra, the "table" is the relation, defining a set of related attributes (column headers) and fields (corresponding column values in each row, or tuple). The mappings between multiple tables/relations are operations. Operations (select, union, etc) take one or more relations as inputs, and return a new relation as the output.

While it does appear that the foreign-key links between tables/relations are the "relations" there, since they seem to define the way different sets of data relate to each other, but the original relation is the set of attributes within each table/relation.

What is your client-facing representation of the model?

How do you manage model changes, from getting new verbal requirements from the client, to communicating your understanding back to them, to tasking it out (user stories?), to schema updates?

Thank you, very good points!
Another saying is that all data relationships become many-to-many over time. First you think that each customer has a phone number, but then you find that they often have several phone numbers, and some phone numbers will belong to multiple customers, and so on.
This is the most important realization for me.

I am starting to chase a form of normalization even more strict than 6NF. Everything gets a synthetic key, there is no such thing as a domain key anymore. Facts are in tables 2 columns wide, no exceptions. Tuples of id & fact. The only other things I permit are a "root" type and a "relation" type, being 1 and 3 columns respectively.

If someone told me I need to start tracking when a phone number was last modified, or create a new approval loop for that change, I want this to be a concern entirely contained under the phone number tables. Not something that needs to be added to everything that has string column called PhoneNumber.

I strongly believe this degree of normalization to be perfect from a domain modeling perspective. It almost feels stupid to not do it this way looking back at all the unexpected changes we've endured.

SQL is also the perfect tool for putting Humpty Dumpty back together again. You can build views to present your change-proof 6NF+ schema in something the business can consume. I.e. wider tables that make some assumptions. If we get our assumptions wrong on the views, we just iterate so we don't break old SQL. VCustomers2, VCustomers3, etc. The cost of iterating your schema by way of views is negligible compared to fucking with it elsewhere.

> I strongly believe this degree of normalization to be perfect from a domain modeling perspective.

How's performance with several tens if not hundreds of joins in a query?

Performance is just fine for us. It's all about scoping for the right things. You are never looking for the whole schema all at once. No one ever said it had to be running on disk either.
Is that anchor modelling?
Very nearly, yes. The biggest distinction is that all keys are synthetic.
I think the author is basically describing the Naked Objects pattern, i.e. deriving the full stack of your application, from storage layer, to internal logic layer, to presentation layer, from one single domain layer.

The problem with this pattern is that it makes the application brittle and difficult to change. If you need to change something in the UI layer, it will likely ripple down through business logic layer into the storage layer. If you make a change in the storage layer, you could inadvertently end up changing your user-facing API and breaking something for them.

The point of all the boilerplate we have today is to make the systems modular and able to be evolved independently of each other. Sure, if you don't need that much modularity, then by all means simplify. But I hope you're at least using a statically-typed language, because chances are you will need to refactor in future and it would really suck to rely on only unit test coverage while making wide-ranging architectural changes across the codebase.

> If you need to change something in the UI layer, it will likely ripple down through business logic layer into the storage layer. If you make a change in the storage layer, you could inadvertently end up changing your user-facing API and breaking something for them.

This sort of "rippling" should normally result from poorly thought out and excessively casual changes layered over previous bad design.

Changes coming from the UI layer should normally be new and altered ways to search and modify the same entities, with no changes to the core model and its storage.

Changes from the storage layer are unlikely (e.g. new indices or adjusted datatypes in a database) and should be handled by some persistence abstraction, with a very limited impact in the business logic and completely invisible and irrelevant for the UI layer.

> If you make a change in the storage layer, you could inadvertently end up changing your user-facing API and breaking something for them.

Here is the thing though: if you have a proper type system, this breakage will happen during compilation. That's a good opportunity to understand what's up there without impact.

So it actually becomes a benefit to be able to move things at one end and be able to see the effects at the other end, not just a protection or a requirement.

Naked Objects is a fascinating idea that seems to have mostly disappeared from view. From my reading on it the UI basically falls out of the domain model fully formed, and that was what end users work with exclusively. I imagine that might seem too simultaneously worrying and limiting for it to gain traction in the conservative corporate sphere.
That was my approach too for a long time. Thankfully I had at my disposal something untypically (for its era) capable of modelling interestingly rich structures (capable of modelling syntaxes in fact), but still it’s a bit of an antipattern if taken to the extreme.

If you’re not careful, you end up with things like CRUD apps, leaving a lot of work to the user. To break away from this, you need to start thinking in use cases, user journeys, and so on. Not that data isn’t vitally important, but it’s not everything either.

Actual article title is "An Ideal Data-Model-First Development Approach". I'm reading it, just started, but the first thing that comes to mind is that the author should look at Ur/Web. Ur/Web is now I guess defunct, but the site is still in the wayback machine.

https://web.archive.org/web/20210220173825/http://www.impred...

Addition: I'm doing other stuff so still haven't read much of this, but one of the first things I see is "Since the above type depends on the value of the type DatabaseContext ... this construct is, unfortunately, impossible in Haskell."

I think it could have used an explanation of what DatabaseContext is supposed to be. It looks to me like the -XDataKinds extension could handle it. There is also Dependent Haskell on its way, though I don't know the current status of it. And again, look at Ur/Web, which I think had this under control. I was expecting to see something about Pi types, like in the Power of Pi paper which describes using them for database schemas. Maybe that is further in.

https://cs.ru.nl/~wouters/Publications/ThePowerOfPi.pdf

Thank you, and it is about Pi and Sigma types in the end, yes. The "Entity" would likely be Pi, and the "Database" - that takes variable number of Type parameters and generalizes type classes - would be Sigma.

"DatabaseContext" should provide a low-level API to the database we want to use, so that our "Entity" or "Database" pi/sigma types generate the same API for our pure types regardless of the underlying database. Using generics, default function implementations etc.

Actually, for single-table it is quite possible even now - we can design an Entity type family that would take care of storing / retreiving flat record types in SQL or Mongo using the same API. We'd still need to pass table names explicitly though (that's where Pi types would be useful), while specific db connection would be hidden inside a monad.

What we can't do - is model relationships elegantly by using function types ("Person -> [Post]" etc) - that's where the "Database" typeclass comes in.

The mail-list has been quiet recently, but Ur/Web is not defunct: http://impredicative.com/ur/
very nice, thank you!
Could be interesting, but it's on Medium.
Yeah, I really hope more people host their own static boring sites.
Or just host them anywhere that don't impose arbitrary viewing restrictions.
There shouldn't be any, as the article is not behind the paywall?