| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by glowcoil 266 days ago
	The original article discusses techniques for constraining the weights of a neural network to a submanifold of weight space during training. Your comment discusses interleaving the tokens of an LLM prompt with Unicode PUA code points. These are two almost completely unrelated things, so it is very confusing to me that you are confidently asserting that they are the same thing. Can you please elaborate on why you think there is any connection at all between your comment and the original article?

1 comments

aghilmort 266 days ago

Our ECC construction induces an emergent modular manifold during KVQ computation.

Suppose we use 3 codeword lanes every codeword which is our default. Each lane of tokens is based on some prime, p, so collectively forms CRT-driven codeword (Chinese Remainder Theorem). This is discretely equivalent to labeling every k tokens with 1x globally unique indexing grammar.

That interleaving also corresponds to a triple of adjacent orthogonal embeddings since those tokens still retain a random gaussian embedding. The net effect is we similarly slice the latent space into spaced chain of modular manifolds within the latent space every k content tokens.

We also refer to that interleaving as Steifel frames for similar reasons as the post reads etc. We began work this spring or so to inject that net construction inside the model with early results in similar direction as post described. That's another way of saying this sort of approach lets us make that chained atlas (wc?) of modular manifolds as tight as possible within dimensional limits of the embedding, floating point precision, etc.

We somewhat tongue-in-cheek refer to this as the retokenization group at the prompt level re: renormalization group / tensor nets / etc. Relayering group is the same net intuition or perhaps reconnection group at architecture level.

link

glowcoil 266 days ago

I'm sorry, but even if I am maximally charitable and assume that everything you are saying is meaningful and makes sense, it still has essentially nothing to do with the original article. The original article is about imposing constraints on the weights of a neural network, during training, so that they lie on a particular manifold inside the overall weight space. The "modular" part is about being able to specify these constraints separately for individual layers or modules of a network and then compose them together into a meaningful constraint for the global network.

You are talking about latent space during inference, not weight space during training, and you are talking about interleaving tokens with random Gaussian tokens, not constraining values to lie on a manifold within a larger space. Whether or not the thing you are describing is meaningful or useful, it is basically unrelated to the original article, and you are not using the term "modular manifold" to refer to the same thing.

link

aghilmort 266 days ago

hmm / hear you. my point wasn't that we are applying modular manifolds in the same way it was that we are working on model reliability from two extremal ends using the same principle. there are various ways to induce modular manifolds in model at various levels of resolution / power. we started at outside / working in level and so it works with any black-box model out of the box and zero knowledge needed, dont even need to know token dictionary to show effect.

We're already working on pushing construction deeper into model both architecture and training. currently that's for fine-tuning and ultimately full architecture shrinkage / pruning and raw training vs. just fine-tuning etc.

& it was just great to see someone else using modular manifolds even if they are using them at the training stage vs. inference stage. they're exploiting modular form at training, we're doing it at inference. cool to see.

link