| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kirillkh 3150 days ago

Actually, I have an idea, albeit not without some doubts.

Let x1 be the number of vectors matching A, x2 the number of vectors matching B, etc, till xn. Let c1..cn be a particular selection of vectors. Now my main assumption here is that in order to determine which of these vectors are most often encountered together in the same context [1], our goal is to find j that maximizes sum_{i from 1 to n, i!=j}[d_i], where d_i=(c_j dot c_i) if the dot product is nonnegative, otherwise d_i=0. I'm not sure it's true primarily because I don't know if by summing up these dot products we add apples to apples or apples to oranges.

Then in order to find the best selection of vectors c1..cn we can iterate on every vector v_k matching A and dot v_k with every vector matching B, then pick the maximum m2 (or 0 if it's negative); dot v_k with every vector matching C, then pick the maximum m3; etc. Thus, for k'th iteration we obtain the selection of vectors that maximizes M_1k=sum_i[m_i]. After we're done with all c1 iterations, we pick the best such selection M1=max_k[M_1k]. This is all done in O(x1(x2+x3+...xn)) time.

Next, we repeat the above process for all x2 vectors matching B and obtain M2, etc, etc. Ultimately, we pick the selection of vectors that produced the highest M_t across all choices of t. Overall, we get O((sum_i[xi])^2), which seems fast enough. What do you think?

[1] One obvious problem is this limits the number of contexts we match against to just one.