|
MySQL is perfectly capable of calculating a linear regression for you, btw. In my case, I needed to be able to estimate trends from sparse time series data. Here's how you do that: SELECT
@a_count := avg(count) as mean_count,
@a_weeks := avg(`week`) as mean_weeks,
@covariance := (sum(`week` * `count`) - sum(`week`) * sum(`count`) / count(`week`)) / count(`week`) as covariance,
@stddev_count := stddev(`count`) as stddev_count,
@stddev_week := stddev(`week`) as stddev_week,
@r := @covariance / (@stddev_count * @stddev_week) as r,
@slope := @r * @stddev_count / @stddev_week as slope,
@y_int := @a_count - (@slope * @a_weeks) as y_int,
@this_week_no := timestampdiff(WEEK, (select min(`date`) from dataset), curdate()) as this_week_no,
@predicted := round(greatest(1, @y_int + (@slope * @this_week_no))) as predicted
FROM (SELECT timestampdiff(WEEK, (select min(`date`) from dataset), `date`) as week, count(date) as count FROM dataset group by WEEK(date)) series;
I had to figure out how to translate the math into SQL, now you don't have to.This performs well enough to be able to crunch tens of millions of rows of data in "reasonable time" on a wimpy VPS. |