|
|
|
|
|
by humanarity
4085 days ago
|
|
"The cluster centers found by STS clustering on any particular run of k-means on stock market dataset are not significantly more similar to each other than they are to cluster centers taken from random walk data! In other words, if we were asked to perform clustering on a particular stock market dataset, we could reuse an old clustering obtained from random walk data, and no one could tell the difference." "As the sliding window passes by, the datapoint first appears as the rightmost value in the window, then it goes on to appear exactly once in every possible location within the sliding window. So the t_i datapoint contribution to the overall shape is the same everywhere..." "Another way to look at it is that every value v_i in the mean vector, 1 ≤ i ≤ w, is computed by averaging essentially every value in the original time series; more precisely, from t_i to t_m-w+i . So for a time series of m = 1024 and w = 32, the first value in the mean vector is the average of t[1..993]; the second value is the average of t[2…994], and so forth. Again, the only datapoints not being included in every computation are the ones at the very beginning and at the very end, and their effects are negligible asymptotically." |
|