| HN Mirror

Nice. Thank you for the addition of slower memory layers.

So MoE models are a bit like thinking tools running concurrently, right(?), sieving through training data on paths that are the same contextually, but different in terms of specificity and sensitivity.

If the agents/experts/ architectures - the code - don't have the minimum required amount of memory & processing power, they might even miss entire bunches of tokens that are or might be relevant within the given (the prompt) and predicted/requested context. So more processing power and or time is relevant only to the extent, here: size, of the to-be-queried-at-inference-time training data (tokens and weights).

Now here's where I find myself exactly within the realm that I was in when I phrased my question: analysing the result of a request and evaluating different sets of tokens, which, I now understand, makes much more sense within the subject of code generation than with the recitation of facts or bits of narratives.

Generated code has functions (things to do with other things). Functions can be done more or less efficient, while even the least efficient code works "more than good and fast enough". There is no value in looping through versions of fact and fiction when the answer fits the expectation. And if it doesn't fit, users can have an actual conversation, which is where I get another part of my answer, which is that more processing power only becomes relevant in relation to the amount of concurrent requests in relation to the parts of the training data that are queried at inference time.

No single request will ever query so much data at the same time, that memory and compute become a bottleneck.

It definitely can become a bottleneck when a long/large/broad( but specific) request gets processed by MoEs simultaneously or when versions of results of engineering tasks are being evaluated. But that is simply not within the task or design of current LLMs and is instead added on top (or as a wrapper, for example, which I still fail to find a non-replaceable usecase for while also still being certain that I will find one once I get to LLMs and AIs).

Again, thanks!