| HN Mirror

On point 3 it should be noted that it's almost always a mistake to optimize for scale at the start of a projects lifetime. There will be exceptions, but in general this is true.

You can always migrate that data to a more useful format if you find it starts hurting you at scale, if you start with the assumption you need the scale you're hurting yourself in the here and now for theoretical future benefit.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen when they are looking at a list view

This is completely, emphatically wrong. I'm somewhat miffed at the air of authority you're using here. People use joins for the normalization of data.

This perspective only makes sense of you assume that designing a scalable system requires MORE work. My experience is that designing a scalable system requires LESS work if you and your team have the right skillset.

In most cases, I can build a scalable system faster than I can build a non-scalable one with the same feature set.

It would make no sense for me to implement the lesser alternative if it requires the same or more work.

I'm always leery of people who claim to be senior and have never spent 3-5 years on the same system, and this attitude is why.

It takes at least that long to really start surfacing the design errors that were made that kills productivity long-term in a system. As a result I very often will claim the difference between a skilled and unskilled developer is the ability for a system they built to be reasonable after 5+ years without everyone involved wanting to rebuild the entire thing from scratch.

IOW, this is a fundamental difference in perspective. I was speaking to creating systems that are maintainable over the long haul by actively trying to control complexity. You're speaking speed of initial development.

Rich Hickey went on a small rant in one of his videos (I think the one describing datomic, but could be wrong) in which he pointed out many things that are fast initially will hurt long-term. I agree with that sentiment wholeheartedly.

The fact that you called the less complex alternative the "lesser" alternative speaks volumes. It honestly feels like the whole "mongodb is webscale" devbro culture rearing its ugly head.

I tend to prefer combining data at the last moment on the client side rather than having it pre-combined on the server side (I prefer REST philosophy over GraphQL). It's probably because I'm web-application focused and so scalability and concurrency is far more important to me than raw execution time. Maybe if I was a data scientist or embedded systems developer, I would care more about execution time. I've met people like that. But IMO performant scripts tend to be the result of more optimizations which makes them harder to maintain as the underlying engines or hardware changes.

This has nothing to do with raw execution time.

pif 1387 days ago

> I prefer to assemble data on the front end as much as possible because it allows my REST API calls to be granular

It's clear you have never work with a lot of data.

> The real reason people use joins is because they want to pack a lot of details onto the user's screen

I hate this illusion that web programming is the whole of software development.

> It's clear you have never work with a lot of data.

Sure, I only wrote an open source distributed pub/sub system with channel-based sharding which has been used by thousands of companies to support hundreds of thousands of concurrent users, but I guess 'lots of data' is a relative term.

That has nothing to do with data or data modeling.

Well that was just my hobby and side-gig... As part of my day jobs, I also worked on many projects with different databases including MySQL, Postgres, SQLite, MongoDB. I also implemented a side project (a distributed financial transaction processing system) using RethinkDB with per-table sharding and replication which runs on Kubernetes with statefulsets for persistence with automatic deployment and autoscaling and automatic database shard re-balancing with high availability with eventual consistency; I used a 2-phase commit algorithm for certain operations to achieve reliability in the event of write failure; as not to rely on atomic database transactions. I also did a course on relational database modeling at university (focused on ER diagrams and database normalization). I worked in the blockchain sector. I wrote a stateful, quantum-resistant blockchain from scratch including the cryptographic signature algorithm which uses an improved Lamport OTS variant suggested by Ralf Merkle and which uses a Merkle Signature Tree for key reuse and I contributed to the front end too. I also wrote a deterministic, fork-resistant, idempotent, heterogeneous multi-chain, chain-to-chain decentralized exchange. I also lead a team which wrote a P2P networking library with decentralized routing and efficient propagation of messages to peers belonging to the same subnets - Nodes in the network organized themselves into an unstructured, partial mesh topology with peer shuffling to avoid eclipse attacks but still retained the ability to form subnets based on the features they supported. But still, "a lot of data" a relative term.

P5fRxh5kUvp2th 1386 days ago

None of that is about data, it's about distributed computing.

No one is saying you're not a smart guy with skills, just that you're obviously not familiar with working with lots of data.