Show HN: Unplugg: An automated Forecasting API for timeseries data

Y	Hacker News new \| ask \| show \| jobs

	Show HN: Unplugg: An automated Forecasting API for timeseries data (unplu.gg)
	100 points by mgontav 3397 days ago

8 comments

syntaxing 3396 days ago

I see from the earlier comments that this runs some sort of "optimized" ARIMA model. Is there anyway to output the statistical information of the fitted model though the API?

link

mgontav 3396 days ago

Not really. We're directing this more towards a completely automated use-case of forecasting with no human interaction, so it's not on our plans to release internal parameterization of the forecast.

link

syntaxing 3396 days ago

Hmm, I think at the minimum, the variance of the forecasted results should be obtainable. I'm not sure how many people would use a black box model without knowing some sort of performance/statistical confidence metric.

link

mgontav 3396 days ago

We could return a confidence interval for the forecasted values, you're right, it can help in cases where it would be used for simple anomaly detection (and it would give a greater sense of security/control over using the forecast).

I do believe we have some similar feature in the pipeline for development, I'll make sure to push it forward. Thanks for the feedback.

link

blparker 3396 days ago

I think a confidence interval would be crucial for the forecasted values.

link

ramblenode 3396 days ago

IMO this would be much more appealing if there were an option to get the model details. I can't see myself presenting results to a group and answering the question "How did you get this?" with "Well, I put my data into this box and what came out seemed pretty good."

link

mgontav 3396 days ago

Well, if we provide confidence intervals it should provide some kind of "safety"... but even so, with forecasting the only assurance you can get is by testing it against known historical values and checking it's accuracy (and even that way, historical results are no guarantee of future performance).

What kind of details would you say can be inspected to see if the model is reliable? AR or MA orders, inferred seasonalities? They can give me some notion of what kinds of assumptions were created about my data, but do not guarantee that it will perform :/

For instances where that kind of insight is needed, I don't think our way is the way to go, but rather the use of some forecasting package (R's Forecast or FB's Prophet) and a more exploratory work. But we're looking more at instances where what matters are the forecasted values and not so much the information underneath - automated anomaly detection systems, consumer-facing apps, and along those lines.

link

ramblenode 3396 days ago

That's a fair answer. Your target audience is interested in prediction rather than inferring parameters.

I do think the confidence intervals/prediction intervals should be accessible and should probably be adjustable (e.g. 99%, 95%, 80%).

link

mr_pink 3395 days ago

> I do think the confidence intervals/prediction intervals should be accessible and should probably be adjustable (e.g. 99%, 95%, 80%).

Definitely, this is arguably the most important feature.

link

y7 3396 days ago

Why is there no mention of the model you use?

link

mgontav 3396 days ago

Our goal is to keep it as simple as we can, keeping the worries about model selection and tuning on our side as much as possible, therefore we don't go much into those details.

I can share that our platform is built on top of ARIMA models, but with a lot of pre-processing work done previously to try and figure out automatically the best parameters to use, as well as a lot of previous hand-tweaking done by ourselves in-house using different datasets (we started out tuning it for forecasting energy consumption, but figured that the resulting models were performing well enough to warrant testing in other domains).

Right now we're opening it up for testing to get more feedback on its performance, so feel free to shoot any more questions or feedback.

link

ashnyc 3396 days ago

@mgonatav . we are building an internal ERP for our manufacturing business. We use our sales data to try and predict what our future sales will look like. We try and produce what we think we are going to be selling in the next few months. Right, if we sale 3 items a day we just do a straight math and assume we will sale 3x10 in 10 days . I would like to talk to you and see how your service can help us

link

mgontav 3396 days ago

Sure thing, shoot me an email at mgontav@unplu.gg and we'll see how we can help you out.

link

ashnyc 3396 days ago

Sent you an email. Thanks

link

keredson 3396 days ago

{ "timestamp": 1458000000, "value": 63.422235 },

dear lord, why? this reminds me of the old "xml binary format" joke:

link

watty 3396 days ago

I don't get the snark - what's the glaring problem with the format? I work with sensor data at my job and very rarely is it uniformly distributed so we use a similar format.

link

keredson 3395 days ago

Because it's horribly inefficient. It's using 49 bytes to encode 8 bytes worth of data. If your data set is a few hundred observations this likely doesn't matter. But most users of timeseries data have millions or billions. (I come from a computational finance background.)

Even if they were wedded to JSON for some reason, they could have just used a list of observations, like:

[1458000000,63.422235],

That would have cut their data costs in half.

Or just use one of the many existing formats for transmitting time series data. It's not a new topic. https://github.com/mobileink/data.frame/wiki/What-is-a-Data-...

link

watty 3395 days ago

This is an API for very small datasets (daily time series data). The goal should be accessibility and readability over saving a few bytes.

I'm not saying it's ideal, I just think the snark is unwarranted considering how common it is. I just checked InfluxDB and they follow a similar model (even more verbose). https://docs.influxdata.com/influxdb/v1.2/guides/querying_da...

Checked a few more and I believe they're the same - Microsoft IoT, Predix (GE), etc.

link

keredson 3394 days ago

that example you give does not follow a similar model. it defines the columns once (not repeated w/ every observation):

                    "columns": [
                        "time",
                        "value"
                    ],

and then the observations as a list of lists:

                    "values": [
                        [
                            "2015-01-29T21:55:43.702900257Z",
                            2
                        ],
                        [
                            "2015-01-29T21:55:43.702900257Z",
                            0.55
                        ],

exactly as i suggested in the "even if they were wedded to JSON for some reason" section of my original explanation.

link

mgontav 3396 days ago

Yeah, it's kind of verbose right now :p We'll get around to it when we make sure the forecast is the best it can be.

EDIT: But XML payloads are actually a really useful idea, it's going ASAP to the Trello board. :p

link

keredson 3396 days ago

hah. well hopefully you support http gzip compression. :)

but in all seriousness, why the timestamp at all? your examples are all spaced at 3600ms. asking for it implies certain behavior. can you handle heterogeneous interval data? missing data?

link

mgontav 3396 days ago

Remnants of working with energy consumption and smart meter data - our examples are still a bit focused on that. On the other hand, the current version is able to handle distinct/heterogeneous intervals, although it clearly does not perform as well. We try (maybe too) hard to mangle the incoming data into something as _standard_ as possible for our models, and sometimes those transforms are not kind to sparse data.

link

dardien 3396 days ago

This is interesting but, if I give you for example 12 months of data (Evenly distributed), how much time in the future will it be able to forecast?

link

mgontav 3396 days ago

We allow for you to specify the limit of the forecasting period, so you can experiment with that.

However, due to how we model the forecast, it isn't realistic to expect ultra-long term predictions, as eventually the forecast will revert to the mean of the series.

In a more practical note, we have seen good results with forecast windows in between 1/4 and 1/8 the size of the historic data given. So, in your case you could expect between 1-3 months of forecast.

link

tommynicholas 3396 days ago

Very cool, something I would definitely use in a project I'm working on if it were a package I could install, and something I would probably use "as is" in many other instances.

link

mgontav 3396 days ago

Hey, you can still try it out as is, just request an API key and start using it. Unless you're worried about sending data out into unknown servers... in that case I can assure you we keep absolutely no information on the data we receive to be processed. Our initial use-case of energy consumption forecasting demanded this kind of data policy, and we're sticking to it.

link

tommynicholas 3396 days ago

I will definitely try it out, thanks!

link

mgontav 3396 days ago

Cool! Get back to us with feedback on how it went, we're looking for as much input as possible at this point.

link

GistNoesis 3396 days ago

Is this somehow connected to facebook Prophet? How do you compare?

link

mgontav 3396 days ago

We've been expecting this question all day long.

As fate has it, we have no connection to FB's Prophet - we at Whitesmith have been working on unplugg for some time now and decided a few weeks ago that this week we'd share it on some communities to have more people testing it and more feedback. It seems that the folks over at Facebook decided something similar. You know what they say, great minds :p

Joking aside, as intimidating as it might have been to see FB releasing a related tool, we feel that we still fill a different segment. From what I've been reading today, Prophet is a tool tailored for timeseries forecasting with human interaction and input in mind - it can work like a black forecasting box but it seems that it is the most useful when paired with an analyst that can keep looking at the output and tweak the model accordingly. It is _really friendly_ as far as forecasting packages go and trust me, we looked at a fair amount of them. That and the use of ProbProgramming to infer their params is just awesome (I'm a fervent Bayesian at heart).

Unplugg on the other hand, fills the need for a "generic" forecasting tool for uses where you don't want/need much specific tailoring and want a really Plug&Play solution - it's an API that you can call from pretty much everywhere, with no dependencies or specific environments needed (so no need to deploy your own R/Python/Matlab - yikes - environment where your models live and run). One possible use case would be an energy monitoring portal that lives completely client-side and requests forecasts to our API on-the-fly directly from the client.

We are still actively developing and testing different forecasting models - the one running is just the one we feel most confident about - and will be looking at Prophet as a possible alternative (although I haven't seen their license carefully, so can't be sure).

link

NumberCruncher 3396 days ago

The last time a start up tried to sell us a plug and play generic forecasting SAAS they made the mistake wanting to impress us and showing us their backend code. It was the first time in my life seeing spark code but it took only 10 minutes to find the spot being responsible for uncontrolled overfitting making their product useless. Every time I open a black box analytic tool happens the same.

link

tommynicholas 3396 days ago

Such a weird coincidence, but I definitely think you're in a different space. For example, per my previous comment where I needed it to be installed and more customizable, I'm going to use Prophet, but for other projects where it's not as important if Unplugg works I may use that.

link

nodesocket 3396 days ago

I'm getting an unexpected error occurred. Here is the sample financial data I am using: http://pastebin.com/W6PJfG3f

link

mgontav 3395 days ago

Thanks for reporting it, we believe to have fixed it in the meanwhile, feel free to keep testing.

link