| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by h0l0cube 1238 days ago
	Can you be more specific? Dot product is about as performant as it gets with linear memory access and SIMD multiply accumulate. Throw random memory access and flow control in there and it’s a struggle to do it faster. Unless the factors are sparse, in which case just elide the zero values.

1 comments

ethbr0 1238 days ago

> scalability is a constraint not a goal. The goal is to minimize some loss function

link

h0l0cube 1238 days ago

My bad. I was under the impression that most search engines are compute bound, but if anything there’s probably a glut of compute for such applications and a market appetite for better results.

link

ethbr0 1238 days ago

Also, I'd assume it's highly time-agnostic (i.e. content change timespan : compute availability timespan).

So you can run your bulk-recomputing whenever you have spare capacity.

Stale rankings aren't great, but don't hurt that much. As long as your liveness is more frequently updated, so you don't send people to dead sites.

link

h0l0cube 1237 days ago

Certainly caching is important, especially for Word2Vec or other NLP which you'd want to happen in a separate stage after crawl, but as someone mentioned in a sibling comment, there are some factors that are calculated per-query, which can have a lot of cache misses for novel queries.

link

ethbr0 1237 days ago

If so, I'd highly suspect Google varies the compute/cache permitted for novel queries.

By this point, I can't imagine they haven't automatically balanced {revenueFromQuery} to {costOfQuery}.

No sense delivering hyperoptimized results if you lose money on generating them.

link

h0l0cube 1237 days ago

I’d suspect you’re right

link