Hacker News new | ask | show | jobs
by ajkjk 63 days ago
I've been intrigued about the thing they call the "data product" for a long time: the fact that there's a sort of equivalence between precomputing position analysis and doing no runtime analysis vs knowing nothing and doing everything at runtime. It is a general property of a lot of algorithms that you can precompute various amounts of computation in order to reduce runtime complexity. It's also the difference between "telling an LLM to do a task" and "telling an LLM to write a bunch of code to do a task", which are also the two ends of a spectrum.

Especially interesting is the fact that the optimal strategy, where on the spectrum you go, is affected by the effectiveness of your algorithms on each side. The more efficiently you can cognitively compress precomputed decisions, the more it makes sense to precompute; the more efficiently you can apply techniques for move selection at runtime. And the two interact a lot: for instance in chess, there are simple heuristics for a lot of endgames, meaning that the state space you explore at runtime can terminate as soon as it gets to an endgame that you have memorized a solution for.

I wonder if anyone knows anyone examining this phenomenon in generality?

1 comments

isn't this very similar or a case of trading space for time or vice versa (e.g in algorithm analysis)
That is certainly an example of it, but I feel like what I'm talking about is more general. In algorithms you tend to be able to say concretely exactly how much space vs time you're trading off. But in a game like Connect 4 or chess it's a lot harder to say exactly how much you gain from, say, adding one more heuristic to your board analysis, or memorizing one more opening subtree. I don't know how to think about that mathematically, and I would like to. It's related, I guess, to what the OP says about finding a graph which is information-theoretically small rather than graph-theoretically small.

Generally it's interesting to contemplate that there may exist heuristics which no one has thought of yet that dramatically improve your overall performance. Actually I thought of an example of this. When AlphaZero and Leela (the ML-based chess engines) started beating Stockfish (the preeminent graph-search chess engine), one of the early things people noticed was that they loved to push their a- and h- pawns down the board, which was, I guess, a sorta passive move that people had systematically undervalued before (cf [1]; I remember seeing a good youtube video about it also but I can't recall what it was). Which means that just having this heuristic was worth some amount of ELO the whole time, just, no one had realized it! Presumably the optimal play for a human in chess, whose mental capacity is limited, is a certain mix of memorizing fixed positions vs heuristics, and the interactions between those are what's interesting.

[1]: https://www.reddit.com/r/chess/comments/ljp151/why_is_h4_pla...

I’ll throw another example.

Do the work of indexing, or summarization or compression vs just do it retrieval on the raw records.

It could be embeddings, or presorts or indices.

But I think the parent poster is correct. There’s an inherent write vs read tradeoff as well as memory vs compute. Maybe with some generalization vs specialization.

Idk if im explaining properly. But you can always front load more effort on a narrower problem space and then utilize it on that space downstream.

I think with your llm example there is no code/data split. The llm is caching a solution to a narrow problem by writing the code.

There are many duality theories in mathematics where both of these perspectives are valid. Each instance is "Duh! What's going on here is just..." and the pragmatic move nimbly through each instance without concern. For others, the potential theory IS the object of interest. Is this apparent pattern a deep principle or just a mirage? When something is actually going on, one learns powerful new techniques by recognizing a precise pattern.

I've stumbled onto deeply disguised dualities in my career. (One could argue with a straight face there are only two, but this is not formalized.) It's more a psychological process than a technical one.

It's always possible to ignore that one is standing in a tree farm. Trees have such interesting bark.