Hacker News new | ask | show | jobs
Show HN: Unplugg: An automated Forecasting API for timeseries data (unplu.gg)
100 points by mgontav 3397 days ago
8 comments

I see from the earlier comments that this runs some sort of "optimized" ARIMA model. Is there anyway to output the statistical information of the fitted model though the API?
Not really. We're directing this more towards a completely automated use-case of forecasting with no human interaction, so it's not on our plans to release internal parameterization of the forecast.
Hmm, I think at the minimum, the variance of the forecasted results should be obtainable. I'm not sure how many people would use a black box model without knowing some sort of performance/statistical confidence metric.
We could return a confidence interval for the forecasted values, you're right, it can help in cases where it would be used for simple anomaly detection (and it would give a greater sense of security/control over using the forecast).

I do believe we have some similar feature in the pipeline for development, I'll make sure to push it forward. Thanks for the feedback.

I think a confidence interval would be crucial for the forecasted values.
IMO this would be much more appealing if there were an option to get the model details. I can't see myself presenting results to a group and answering the question "How did you get this?" with "Well, I put my data into this box and what came out seemed pretty good."
Well, if we provide confidence intervals it should provide some kind of "safety"... but even so, with forecasting the only assurance you can get is by testing it against known historical values and checking it's accuracy (and even that way, historical results are no guarantee of future performance).

What kind of details would you say can be inspected to see if the model is reliable? AR or MA orders, inferred seasonalities? They can give me some notion of what kinds of assumptions were created about my data, but do not guarantee that it will perform :/

For instances where that kind of insight is needed, I don't think our way is the way to go, but rather the use of some forecasting package (R's Forecast or FB's Prophet) and a more exploratory work. But we're looking more at instances where what matters are the forecasted values and not so much the information underneath - automated anomaly detection systems, consumer-facing apps, and along those lines.

That's a fair answer. Your target audience is interested in prediction rather than inferring parameters.

I do think the confidence intervals/prediction intervals should be accessible and should probably be adjustable (e.g. 99%, 95%, 80%).

> I do think the confidence intervals/prediction intervals should be accessible and should probably be adjustable (e.g. 99%, 95%, 80%).

Definitely, this is arguably the most important feature.

Why is there no mention of the model you use?
Our goal is to keep it as simple as we can, keeping the worries about model selection and tuning on our side as much as possible, therefore we don't go much into those details.

I can share that our platform is built on top of ARIMA models, but with a lot of pre-processing work done previously to try and figure out automatically the best parameters to use, as well as a lot of previous hand-tweaking done by ourselves in-house using different datasets (we started out tuning it for forecasting energy consumption, but figured that the resulting models were performing well enough to warrant testing in other domains).

Right now we're opening it up for testing to get more feedback on its performance, so feel free to shoot any more questions or feedback.

@mgonatav . we are building an internal ERP for our manufacturing business. We use our sales data to try and predict what our future sales will look like. We try and produce what we think we are going to be selling in the next few months. Right, if we sale 3 items a day we just do a straight math and assume we will sale 3x10 in 10 days . I would like to talk to you and see how your service can help us
Sure thing, shoot me an email at mgontav@unplu.gg and we'll see how we can help you out.
Sent you an email. Thanks
{ "timestamp": 1458000000, "value": 63.422235 },

dear lord, why? this reminds me of the old "xml binary format" joke:

<byte> <bit>0</bit> <bit>0</bit> <bit>1</bit> <bit>0</bit> <bit>0</bit> <bit>1</bit> <bit>0</bit> <bit>0</bit> </byte>

I don't get the snark - what's the glaring problem with the format? I work with sensor data at my job and very rarely is it uniformly distributed so we use a similar format.
Because it's horribly inefficient. It's using 49 bytes to encode 8 bytes worth of data. If your data set is a few hundred observations this likely doesn't matter. But most users of timeseries data have millions or billions. (I come from a computational finance background.)

Even if they were wedded to JSON for some reason, they could have just used a list of observations, like:

[1458000000,63.422235],

That would have cut their data costs in half.

Or just use one of the many existing formats for transmitting time series data. It's not a new topic. https://github.com/mobileink/data.frame/wiki/What-is-a-Data-...

This is an API for very small datasets (daily time series data). The goal should be accessibility and readability over saving a few bytes.

I'm not saying it's ideal, I just think the snark is unwarranted considering how common it is. I just checked InfluxDB and they follow a similar model (even more verbose). https://docs.influxdata.com/influxdb/v1.2/guides/querying_da...

Checked a few more and I believe they're the same - Microsoft IoT, Predix (GE), etc.

that example you give does not follow a similar model. it defines the columns once (not repeated w/ every observation):

                    "columns": [
                        "time",
                        "value"
                    ],
and then the observations as a list of lists:

                    "values": [
                        [
                            "2015-01-29T21:55:43.702900257Z",
                            2
                        ],
                        [
                            "2015-01-29T21:55:43.702900257Z",
                            0.55
                        ],
exactly as i suggested in the "even if they were wedded to JSON for some reason" section of my original explanation.
Yeah, it's kind of verbose right now :p We'll get around to it when we make sure the forecast is the best it can be.

EDIT: But XML payloads are actually a really useful idea, it's going ASAP to the Trello board. :p

hah. well hopefully you support http gzip compression. :)

but in all seriousness, why the timestamp at all? your examples are all spaced at 3600ms. asking for it implies certain behavior. can you handle heterogeneous interval data? missing data?

Remnants of working with energy consumption and smart meter data - our examples are still a bit focused on that. On the other hand, the current version is able to handle distinct/heterogeneous intervals, although it clearly does not perform as well. We try (maybe too) hard to mangle the incoming data into something as _standard_ as possible for our models, and sometimes those transforms are not kind to sparse data.
This is interesting but, if I give you for example 12 months of data (Evenly distributed), how much time in the future will it be able to forecast?
We allow for you to specify the limit of the forecasting period, so you can experiment with that.

However, due to how we model the forecast, it isn't realistic to expect ultra-long term predictions, as eventually the forecast will revert to the mean of the series.

In a more practical note, we have seen good results with forecast windows in between 1/4 and 1/8 the size of the historic data given. So, in your case you could expect between 1-3 months of forecast.

Very cool, something I would definitely use in a project I'm working on if it were a package I could install, and something I would probably use "as is" in many other instances.
Hey, you can still try it out as is, just request an API key and start using it. Unless you're worried about sending data out into unknown servers... in that case I can assure you we keep absolutely no information on the data we receive to be processed. Our initial use-case of energy consumption forecasting demanded this kind of data policy, and we're sticking to it.
I will definitely try it out, thanks!
Cool! Get back to us with feedback on how it went, we're looking for as much input as possible at this point.
Is this somehow connected to facebook Prophet? How do you compare?
We've been expecting this question all day long.

As fate has it, we have no connection to FB's Prophet - we at Whitesmith have been working on unplugg for some time now and decided a few weeks ago that this week we'd share it on some communities to have more people testing it and more feedback. It seems that the folks over at Facebook decided something similar. You know what they say, great minds :p

Joking aside, as intimidating as it might have been to see FB releasing a related tool, we feel that we still fill a different segment. From what I've been reading today, Prophet is a tool tailored for timeseries forecasting with human interaction and input in mind - it can work like a black forecasting box but it seems that it is the most useful when paired with an analyst that can keep looking at the output and tweak the model accordingly. It is _really friendly_ as far as forecasting packages go and trust me, we looked at a fair amount of them. That and the use of ProbProgramming to infer their params is just awesome (I'm a fervent Bayesian at heart).

Unplugg on the other hand, fills the need for a "generic" forecasting tool for uses where you don't want/need much specific tailoring and want a really Plug&Play solution - it's an API that you can call from pretty much everywhere, with no dependencies or specific environments needed (so no need to deploy your own R/Python/Matlab - yikes - environment where your models live and run). One possible use case would be an energy monitoring portal that lives completely client-side and requests forecasts to our API on-the-fly directly from the client.

We are still actively developing and testing different forecasting models - the one running is just the one we feel most confident about - and will be looking at Prophet as a possible alternative (although I haven't seen their license carefully, so can't be sure).

The last time a start up tried to sell us a plug and play generic forecasting SAAS they made the mistake wanting to impress us and showing us their backend code. It was the first time in my life seeing spark code but it took only 10 minutes to find the spot being responsible for uncontrolled overfitting making their product useless. Every time I open a black box analytic tool happens the same.
Such a weird coincidence, but I definitely think you're in a different space. For example, per my previous comment where I needed it to be installed and more customizable, I'm going to use Prophet, but for other projects where it's not as important if Unplugg works I may use that.
I'm getting an unexpected error occurred. Here is the sample financial data I am using: http://pastebin.com/W6PJfG3f
Thanks for reporting it, we believe to have fixed it in the meanwhile, feel free to keep testing.