Hacker News new | ask | show | jobs
by DoctorOetker 9 days ago
I'm not sure what Bernie Sanders imagines "you owning half of AI" means.

Every US citizen owns half of US AI compute budget? There are more than 2 US citizens.

Perhaps he means all US citizens together should own half of the AI compute budget?

The real question should never be "how can we exploit popular jealousy for political gain?" but "what favorable position can we provide to our population in a world where AI becomes a permanent, powerful presence?".

Different architectures for LLM's have shown that the latent variables can be smoothly approached by gradient descent, during pre-training but also during distillation from one model to another.

With advances in distillation techniques, and suitable architecture, I believe we could transfer LLM knowledge to a human's ready knowledge (also known as the readily available knowledge at one's fingertips, or memorized knowledge), by means of an interactive computer game, a reaction speed game.

An Ansatz or problem-approach is only as strong as the weakest link, regularly asking the right questions. It is not desirable nor sufficient for humans to have mere dialogue access to ever smarter LLM's. If they fail to ask the right questions, BIBO (bullsh in and bullsh out).

Consider every time a model is thrown into a fresh conversation, it has no idea about which prior knowledge a user has, while real human experts would modify their answers on the basis of the perceived level of knowledge of an interlocutor.

Think of people you know well, this could be a dear friend, a partner, a colleague, a sibling, your annoying parents, ... Even though they are not present, you can easily imagine conversations with them, your brain has built a model of what they would statistically say.

Now imagine it were possible to slowly but certainly "upload" an LLM, by playing a computer game. After long enough playing, the human can similarily predict what the LLM would respond. Suddenly you are multi-lingual, and you know most programming languages, domain knowledge, loads and loads of unasked for domain knowledge.

Uploading those weights should be much faster than going through conventional curricula that require the user to go through each derivation or demonstration, since we observe experimentally that representative (i.e. reasonable likelihood) sequences of embedding vectors don't actually need to be exact token embeddings during knowledge distillation, distilling on a virtual corpus where the "token" vector inputs are smoothly interpolated can work faster than actual corpus sentences. A computer could "hyper-uniformly" populate input sequences to be trained without the inefficiency of random sampling.

I can only see uploading LLM skills to humans work if the architectures would be reformulated so that each weight belongs to a token, which is why I think its a desirable goal to attempt SOTA LLM's not based on GPT's.

One approach would be to formulate LLM such that the only parameters in the model are token components, suppose we give each token not a simple embedding vector but an embedding matrix (for matrix products), or embedding multivector (for geometric products), such that the likelihood is exp(-|S|) or log-likelihood is -|S|, where || is just the sum of squares of the matrix/multivector components of S that represents a sentence. S=T1 x T2 x T3 x ... x TN, the product of all token matrices (or multivectors). While traditional vectors can be added, they can not transcend bag of words since the order is immaterial when adding them due to commutativity. That is not the case for matrix products or geometric products for multivectors.

Uploading the reformulated LLM would equate to getting the user to reproduce correct token positions in random 2D projections (a background pattern should aid in understanding what 2D projection is taken, but the cloud of tokens rendered should similarily imply it). Asking a user a question is expensive. To allow the user to learn the majority of tokens should be rendered at their correct position. A minority of tokens is intentionally misplaced, and the user tries to guess which token is rendered at an incorrect position. This is a form of pooled testing, where only a minority of tokens are misplaced.

Sometimes bigrams (2 tokens concatenated) are rendered as well (if the user has demonstrated succeeding in locating the individual tokens correctly in random 2D projections). So projecting token matrices into 2D screen vectors allows users to learn their "location" in a high dimensional space. Grading users in predicting the location of bigrams where the matrices of the first and second token are multiplied will teach the brain how to predict the likelihood of concatenating strings. So once a user succeeds in correcting unigram token positions, and once a user succeeds in correcting random bigram token positions, their brain is effectively capable of predicting token likelihoods and implicitly the next token distribution!

The future of LLM consumption will not look like a 100% usage of "chatbots and agents", but will also involve reformulating those LLM's so the human brain can consume this knowledge by internalizing it at a higher rate than mere conversation would allow.

Politician's ought to think of ways to sponsor translating machine learning excellence into human learning excellence, and I don't mean crappy "AI in classroom settings", but scifi level uploading by playing a reaction speed game.

In the future people will see token clouds as they go to sleep (as some have seen solitaire games in the past) after some intense weeks of catching up with a novel LLM, and then they will be able to proceed in their projects more effectively, and be able to give very detailed and very explicit requests to compute side LLM's. And those chatbot/agent LLM can be told the user has similar knowledge to LLM X, after having trained themselves on LLM X.

I wonder what different opinions Bernie Sanders would have if he were told about an untested but unexplored possibility of elevating all humans (not just knowledge workers) to the maximum of their original performance and LLM performance all points. (Just like you mind can model what your friends, family, colleagues would say, it will accurately model what the LLM would say, that doesn't mean you lose what you already know, nor does it mean you blindly believe whatever the LLM says, its just there in the back of your mind reminding you of things, just like thinking of your friends and family reminds you of how they might respond, that doesn't make your beliefs identical to your friends or family either.