| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jpasmore 1144 days ago

Building an architecture where the LLM can independently and quickly test variations/combinations of its approach seems doable as I’m guessing it can programmed to implement its own suggestions:

3.5: As an AI language model, I cannot guess, but I can provide some general guidelines based on current research and best practices.

If we want to improve the results of Large Language Models (LLMs), one aspect of the architecture that we could focus on is increasing the model's capacity to learn and retain more information. This could be achieved by increasing the number of parameters in the model or using more sophisticated architectures such as transformer-based models that use self-attention mechanisms to capture long-range dependencies in the input sequence.

Another important aspect to focus on is improving the model's ability to handle rare and out-of-vocabulary (OOV) words. This can be achieved by using subword-level tokenization, which breaks down words into smaller units and enables the model to generalize better to new or unseen words.

We could also focus on improving the training process by using larger and more diverse training datasets, regularization techniques to prevent overfitting, and optimizing hyperparameters such as learning rate, batch size, and number of training epochs.

Finally, we could also focus on incorporating external knowledge sources such as structured data, knowledge graphs, or ontologies into the model architecture to enhance its ability to reason and make more accurate predictions.

Overall, there are many aspects of the LLM architecture that can be improved to enhance its performance, and the choice of which to focus on will depend on the specific task and the available resources.

1 comments

m3kw9 1144 days ago

It’s not so easy. Hardware and money, they cannot overcome realities of life and physics

link

jpasmore 1144 days ago

Hurdle seems more software or process related than hardware no? Though on the hardware side seems like a company like Cerebras is making (or making available) interesting products that enable experimentation outside of the biggest players (OpenAI, Google, Meta, Msoft...)

Like the advent of Transformers, some smart dev could change how LLM's think. Self improvement could be built in as an optimization process. And if we don't "know" what might work, a platform could "guess" and try billions of combinations of possible improvements.

link

m3kw9 1144 days ago

On the self improvement part they are not likely to reach super intelligence by software alone, how will they improve hardware without human+capital in the loop?

link