Hacker News new | ask | show | jobs
by jackmaney 4061 days ago
>> Perhaps 70% of developer time is spent dealing with parsing, serialization, and persistence. Values are encoded to and from JSON, to and from various binary formats, and to and from various persistent data stores… over and over again.

> This is not my experience at all.

Yeah, unless you're tinkering around in a side project just to learn something, don't build your own JSON parser or writer.

I've spent WAY more time thinking about how to structure my data for serialization than how to serialize it. For the vast majority of use cases, serialization is such a solved problem that if there isn't an obvious way to proceed, you're probably doing it wrong.

>> Another 25% is spent on explicit networking. We don’t merely specify that a value must be sent from one node to another, we also specify how in exhaustive detail.

> Honestly I'm not sure what the author means by this.

Doing low-level socket programming? Maybe? But in the grand scheme of things, that's probably a fairly specialized use case.

4 comments

I suspect that "how to structure my data for serialization" is what the author means by the 70% of time spent on parsing, serialization, and persistence. I hadn't heard of Unison before, but I recognize the author's name from Lambda: The Ultimate, and I suspect that what he has in mind is that any value within the Unison language can appear on any Unison node, transparently. Instead of picking out exactly which fields you need and then creating new JSONObjects or protobufs for them, just send the whole variable over.

I also suspect (being a language design geek, and also having worked with some very large distributed systems) that the reason why this is seductive is also why it's unworkable. I think I probably do spend close to 70% of my time dealing with networking and data formats (and yes, I use off-the-shelf serialization formats and networking protocols), but that's because a watch is very different from a phone which is very different from a persistent messaging server which is different from a webpage, and Bluetooth is very different from cell networks which are very different from 10G-Ethernet in a DC. Try to dump your server data structures directly to your customer's cell phone and you're about to have a lot of performance and security problems.

Maybe the author started this project some years ago, before parsing and network libraries were common? Because if you don't have libraries, he's right.

Starting from scratch can yield radically better solutions than how tech/market happened to evolve.

It would have to be extremely old for that to be true. All the problems he mentioned have had some form of solution for decades. Some uses cases needed significant changes in those solutions (ex: need for NoSQL DBs) but most development have stayed in a zone where the available patterns existed for the things he mentioned.
Perhaps, but the "solutions", when put together in a single project, probably resemble the "30 quicksorts in a single binary" syndrome. Lots of code doing conceptually the same things.
hmm, JSON is only a decade and a half old, so not "decades" for that particular one. And it's only really taken off within the last few years.

You're right about it being conceptually solved for many decades, but we here developers like to reinvent everything every decade or so.

For example, back when XML was gaining in popularity, around 2000, there were new parsers/serializing libraries launched all over the place - even standards like SAX and DOM. Many people rolled their own; many had to.

It's not like JSON is the first serialization format ever. (I still have XDR & CDR nightmares, and that was the early 90's)

So, yes, decades. Not necessarily solved well, but yes, people shipped data over the network before JS ;)

If you're doing low-level anything, it's usually because you're interested in the "exhaustive detail". This is quite confusing.

> Also as a result, Unison has a simple story for serialization and sharing of arbitrary terms, including functions. Two Unison nodes may freely exchange data and functions—when sending a value, each states the set of hashes that value depends on, and the receiving node requests transmission of any hashes it doesn’t already know about. Using nameless, content-based hashes for references sidesteps complexities that arise due to the possibility that sender and receiver may each have different notions of what a particular symbol means (because they have different versions of some libraries, say).

Yeah, I can see how this is going to solve the problem of persistence and networking once and for all. Or not. As for sidestepping the problem of having different versions of the same library on different nodes and serializing data, that's going to work fine until the first rename, or when node A with version 1 of the library sends a data structure with half the fields understood by version 2 of the library on node B...

He addresses this issue - see the part about using hashes for everything so neither names nor lib versions matter, only the contents do, identified by hashes.
I didn't read it as getting your data serialized for storage, I read it as serializing your data between different libraries of a program. You know, putting the arguments to a function the right way round, changing the structure so that library A can talk to library B. I do spend a fair amount of time doing that, and it would be solved by a more semantic approach.
That's not serialization.