|
|
|
|
|
by pico_creator
1022 days ago
|
|
Ideas and trends move much slower, than we think, especially outside of silicon valley, or the current bubbles we as individual are in. We as humans have a tendency of not wanting things to change, nor accept new ideas that challenge our existing ideas. And shape our memories accordingly. Transformer example: The world didn't switch over immediately to transformer's in 2017, in fact the original model had issues converging past a 100M params that needed to be sorted out. And arguably picked up steam mostly after BERT a year later. Day-to-day example: The average day to day person, outside the tech bubble, still have not tried ChatGPT - and unfortunately it has not taken the world by storm yet. So while it is true that traditional RNNs with LSTM do not converge as well. The changes presented here are substantial (we removed LSTM for example) And it's not a question of belief, RWKV code is fully opensource, in public and available. All claims can be tested. With results that can be replicated. By anyone who is willing to put in the time to do so. |
|