Hacker News new | ask | show | jobs
by keredson 3396 days ago
{ "timestamp": 1458000000, "value": 63.422235 },

dear lord, why? this reminds me of the old "xml binary format" joke:

<byte> <bit>0</bit> <bit>0</bit> <bit>1</bit> <bit>0</bit> <bit>0</bit> <bit>1</bit> <bit>0</bit> <bit>0</bit> </byte>

2 comments

I don't get the snark - what's the glaring problem with the format? I work with sensor data at my job and very rarely is it uniformly distributed so we use a similar format.
Because it's horribly inefficient. It's using 49 bytes to encode 8 bytes worth of data. If your data set is a few hundred observations this likely doesn't matter. But most users of timeseries data have millions or billions. (I come from a computational finance background.)

Even if they were wedded to JSON for some reason, they could have just used a list of observations, like:

[1458000000,63.422235],

That would have cut their data costs in half.

Or just use one of the many existing formats for transmitting time series data. It's not a new topic. https://github.com/mobileink/data.frame/wiki/What-is-a-Data-...

This is an API for very small datasets (daily time series data). The goal should be accessibility and readability over saving a few bytes.

I'm not saying it's ideal, I just think the snark is unwarranted considering how common it is. I just checked InfluxDB and they follow a similar model (even more verbose). https://docs.influxdata.com/influxdb/v1.2/guides/querying_da...

Checked a few more and I believe they're the same - Microsoft IoT, Predix (GE), etc.

that example you give does not follow a similar model. it defines the columns once (not repeated w/ every observation):

                    "columns": [
                        "time",
                        "value"
                    ],
and then the observations as a list of lists:

                    "values": [
                        [
                            "2015-01-29T21:55:43.702900257Z",
                            2
                        ],
                        [
                            "2015-01-29T21:55:43.702900257Z",
                            0.55
                        ],
exactly as i suggested in the "even if they were wedded to JSON for some reason" section of my original explanation.
Yeah, it's kind of verbose right now :p We'll get around to it when we make sure the forecast is the best it can be.

EDIT: But XML payloads are actually a really useful idea, it's going ASAP to the Trello board. :p

hah. well hopefully you support http gzip compression. :)

but in all seriousness, why the timestamp at all? your examples are all spaced at 3600ms. asking for it implies certain behavior. can you handle heterogeneous interval data? missing data?

Remnants of working with energy consumption and smart meter data - our examples are still a bit focused on that. On the other hand, the current version is able to handle distinct/heterogeneous intervals, although it clearly does not perform as well. We try (maybe too) hard to mangle the incoming data into something as _standard_ as possible for our models, and sometimes those transforms are not kind to sparse data.