Hacker News new | ask | show | jobs
by danielvaughn 900 days ago
Reminds me of a situation I ran into years ago. I worked at a fintech startup where we were reverse engineering the mobile APIs of retail stock brokerages. Eventually we ran out of brokers in the US and began looking overseas. The first one we looked at was a large broker in Singapore.

Their API responses were in some absolutely insane markup language that I'd never seen before. I actually had to spend a good deal of time reading up on the history of markup languages, carefully going through each one to see if the syntax matched.

Eventually I gave up and just had to write a parser myself. The worst bit was that the attributes didn't use quotation marks around the values. So you'd literally have markup like:

  <something name=Hello world />
It was...fun times.
4 comments

I'm reminded of the comment in XeePhotoshopLoader.m:

  // At this point, I'd like to take a moment to speak to you about the Adobe PSD format.
  // PSD is not a good format. PSD is not even a bad format. Calling it such would be an
  // insult to other bad formats, such as PCX or JPEG. No, PSD is an abysmal format. Having
  // worked on this code for several weeks now, my hate for PSD has grown to a raging fire
  // that burns with the fierce passion of a million suns.
  // If there are two different ways of doing something, PSD will do both, in different
  // places. It will then make up three more ways no sane human would think of, and do those
  // too. PSD makes inconsistency an art form. Why, for instance, did it suddenly decide
  // that *these* particular chunks should be aligned to four bytes, and that this alignement
  // should *not* be included in the size? Other chunks in other places are either unaligned,
  // or aligned with the alignment included in the size. Here, though, it is not included.
  // Either one of these three behaviours would be fine. A sane format would pick one. PSD,
  // of course, uses all three, and more.
  // Trying to get data out of a PSD file is like trying to find something in the attic of
  // your eccentric old uncle who died in a freak freshwater shark attack on his 58th
  // birthday. That last detail may not be important for the purposes of the simile, but
  // at this point I am spending a lot of time imagining amusing fates for the people
  // responsible for this Rube Goldberg of a file format.
  // Earlier, I tried to get a hold of the latest specs for the PSD file format. To do this,
  // I had to apply to them for permission to apply to them to have them consider sending
  // me this sacred tome. This would have involved faxing them a copy of some document or
  // other, probably signed in blood. I can only imagine that they make this process so
  // difficult because they are intensely ashamed of having created this abomination. I
  // was naturally not gullible enough to go through with this procedure, but if I had done
  // so, I would have printed out every single page of the spec, and set them all on fire.
  // Were it within my power, I would gather every single copy of those specs, and launch
  // them on a spaceship directly into the sun.
  //
  // PSD is not my favourite file format.
> Why, for instance, did it suddenly decide that these particular chunks should be aligned to four bytes, and that this alignement should not be included in the size?

Probably because, like many other ancient document formats (e.g. MS Office), it was a straight dump of memory structures into a file [1]. Obviously a very bad idea in hindsight (especially given the truckload of deserialization vulns resulting from it), but computers from that age were so memory-constrained that anything else wouldn't cut it, and by the time computers got more powerful the old formats were hopelessly entrenched.

[1] https://www.joelonsoftware.com/2008/02/19/why-are-the-micros...

Flash and PDF have the same illnesses. Suspiciously many Adobe file formats are both overly complicated and contain lots of features that are way too powerful in combination and impossible to support properly.
As a side note, I really miss Xee. It was clearly a labour of love and it showed - MacPaw has been a terrible steward ever since Dag handed it over to them.

No updates other than straight up SDK bumps and recompiles, broken loading of random images in recent macOS/Apple Silicon, they somehow managed to break cropping in one of the two or three updates they did, still an Intel binary. Clearly they haven't tested it more than just checking if the app opens.

Really wish Dag had just open sourced Xee3 instead, my opinion of MacPaw plumetted after seeing how they massacred my boy Xee.

The Archive Browser was equally neglected. At least The Unarchiver still works, which in retrospect was clearly the only app MacPaw wanted to take off Dag's hands.

The Unarchiver still doesn’t handle multipart RAR files correctly fairly often, too.
This is hilarious, and not at all surprising. The format has been around longer than most programmers today have been alive, with all the legacy cruft you could expect as it's changed over the years.
> I can only imagine that they make this process so // difficult because they are intensely ashamed of having created this abomination.

They also don't want anybody building a dependency on that sh*t, which would prevent them from ever cleaning up the mess.

I think there's been a dependency since day 1. For example, I have lots of photos scanned in the 90's with my old (old old) scanner, and they are in .psd format. Which for instance, macos can still preview.
That ship has sailed: https://www.hyrumslaw.com/ and, of course, the obligatory xkcd https://xkcd.com/1172/

When I read that rant comment, I thought of "oh, so it's microservices writing to a shared byte array" in a nod to https://en.wikipedia.org/wiki/Conway%27s_law

This "like standard protocol/format X, but strangely invalid" is a thing I've seen many times.

I speculate that one of the ways this happens is that someone decides or is told to use format Foo. Then they and possible collaborators implement both the writer and the reader for their idea of Foo from scratch, never testing with an off-the-shelf standard parser.

You'd think that doing XML like this is unlikely, given how easily available correct and validating parsers have been. But I've nevertheless seen this with XML too. I speculate that sometimes the programmer is on a platform that doesn't have an easily available off-the-shelf parser/writer, or they simply don't know about it.

I've also seen a variation of this, in half-butted "integrations", like to have a sales check-off feature of "we can generate X". These are sometimes tested only lightly, and sometimes not at all (such as when they don't have access to the tool that uses that format, and they were just working from poor documentation or an example). It's a thing.

> I speculate that sometimes the programmer is on a platform that doesn't have an easily available off-the-shelf parser/writer, or they simply don't know about it.

I bet this sounds surreal to people visiting this site, but there are really corporations out there running on software written by people who never heard of XML. Another example is a "database" implementation I have seen in a multi-billion dollar company which relied on a hierarchy of directories containing JSON files mimicking tables and rows inside a relational DB.

The particular product in question had tens of millions of dollars yearly revenue.

>You'd think that doing XML like this is unlikely, given how easily available correct and validating parsers have been. But I've nevertheless seen this with XML too.

Guilty.

Although in my defence it was during the early days of XML and the platform options had their own problems.

You are not alone. I did a project to upgrade "sort of XML" to standard XML. Your example content gave me flashback shivers.
I have seen code that produces output like this first hand. Instead of doing proper serialization, they were using string templating to construct the response and never bothered to validate the output. Laziness and stupidity basically.
You're very likely correct, which is funny because they turned out to have incredibly security. We hacked the APIs of all the US brokers without an issue, but I didn't even make it past the auth stage with this Singapore broker.

One morning I was working on their login flow - not doing anything crazy, mind you. Just a bit weird; logging in and out, watching the req/res cycles with Charles Proxy. All of a sudden my boss comes over and tells me to stop immediately. Apparently I set off so many alarm bells at the broker that the CTO was woken up (it was 2am where they were). That was a fun gig lol.

In my former PHP life I've seen people looping through objects and constructing a JSON string by hand instead of using a simpler single json_encode() call.