Hacker News new | ask | show | jobs
by chrgy 1123 days ago
Here is a summary of all comments by Transformers, wonder how RNN does:

RWKV is a new language model architecture that is comparable to transformers in terms of performance. RWKV is more efficient than transformers, which makes it possible to train larger models on smaller datasets. The RWKV community is open source and welcomes contributions from anyone. There are plans to create larger versions of RWKV, but this will require more computational resources. Here are some additional details about the chinchilla law and the dataset problem:

The chinchilla law states that the amount of data required to train a language model grows exponentially with the model size. This means that it is very expensive to train large language models, even with the latest hardware. The RWKV community is working on developing new methods for training large language models more efficiently. There are a number of datasets available to the RWKV community, including:

The Pile: A massive dataset of text and code. The Chinchilla: A smaller dataset of text and code that is designed for training RWKV models. The Red Pajamas: A dataset of text and code that is being used to train a 65B RWKV model. These datasets are stored in a variety of locations, including:

The RWKV GitHub repository The Chinchilla website The Red Pajamas website The RWKV community is constantly updating the datasets and adding new ones. If you are interested in contributing, please visit the RWKV GitHub repository.

1 comments

> The chinchilla law states that the amount of data required to train a language model grows exponentially with the model size. This means that it is very expensive to train large language models, even with the latest hardware. The RWKV community is working on developing new methods for training large language models more efficiently. There are a number of datasets available to the RWKV community, including:

What? I though the chinchilla-optimal regime was something like "20 tokens per weight". That's not remotely exponential, even by the word's colloquial use.

It seems that the user is posting ChatGPT-generated text and not a real summary. It's complete nonsense, with about half of the sentences containing a factual error.