Hacker News new | ask | show | jobs
by llimllib 2018 days ago
~13 years ago (!) I worked with the Amazon Seller API and left this comment in my own source code:

    // the XML returned from this request is *mind-bogglingly* bad. Terrifyingly bad.
    // a completed batch looks like this:
    // <Batch>batchid=363777811 status=Done dateandtime=09/18/2007 09:53:10 PDT activateditems=335 numberofwarnings=0 itemsnotacivated=17 </Batch>
    // and an incomplete batch like:
    // <Batch>batchid=363778361 status=In Progress </Batch>
    // so we'll just parse each item as a regex. Thanks Amazon.
The documentation at the time was just a post on a forum, that later got removed, so it no longer exists at all but this was just one of many horrors.
9 comments

I kid you not, I know exactly what you're talking about there. I also wrote my own regex parsers with fast-forward and rewind tokenizers, because it was so dreadful. I wouldn't normally do that.

There's also 4 (4!!!) different ways of returning errors in the MWS Feed APIs when used from Java. Given the number of entities, and the fact that they're combinatorial I had to do some of the craziest Scala code I've ever written because duplication was just too bad to handle for how critical that code was. To do it in a bulletproof way, that error handling for MWS feeds inherently has 2^4 control flows * N entities = 16N code paths at like 100-200 lines a piece if you didn't use some higher order abstractions.

That looks like putting "<Batch>" and "</Batch>" around a legacy text format and call it a day, to report that your service/API "supports XML" lol. Actually, SGML has mechanisms (shortrefs and "data tags") to parse strings like that as markup and could even infer "<Batch>" tags (but it still won't work satisfactorily with your example data).
I mean, Bezos did say people who didn't expose services as APIs would be fired...
I was kind of hoping for:

<Batch batchstate="batchid=363777811 status=Done dateandtime=09/18/2007 09:53:10 PDT activateditems=335 numberofwarnings=0 itemsnotacivated=17" />

:)

reminds me of a description of MSFT's first XML-based office file formats (not sure if its true or was a joke) but it went something like

<xml> <office-proprietary-binary-blob> wky4b5tlwybkjbb2... </office-proprietary-binary-blob> </xml>

I kind of remember a massive CDATA blob... but my memory must be playing tricks on me: Wikipedia shows some sample markup of the pre-2007 Microsoft Office (https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats) and it doesn't look bad at all.

And it seems after 2007 they switched to an ECMA standard.

In all fairness, during the first XML migration, I'm sure internal MS folks were just as baffled at the previous format.
Xml is an open standard man, just pretend your Neo from the Matrix reading the screen saver
I note with horror that "activated" is misspelt in "itemsnotacivated" but spelt correctly in "activateditems".
the text based format in that implies lovely horrors. now all it needs is sometimes quoted items after "key=" (using and requiring different quotes, depending on which system you're speaking to) and at least 3 sources of "universal event ID" counter... just one "batchid" is too simple.
<Batch>batchid=363777811 status=Done dateandtime=09/18/2007 09:53:10 PDT activateditems=335 numberofwarnings=0 itemsnotacivated=17 </Batch>

Philosophically speaking, is this any different than 'syntactic salt' for a 'Batch' JSON object? ;)

Yes. It isn't json. Those are non-delimited key-value pairs.
Were they not using a standard XML serialization library?
Jesus christ just kick me in the balls than making me deal with that XD