Hacker News new | ask | show | jobs
by cztomsik 1248 days ago
Give it at least few examples. ~1B networks are not good in zero-shot. Also, don't expect to get answers for things it was not trained on. the_pile is not programming dataset.

RWKV is important because it's fast, it can be trained in parallel and it gives very good results (compared to other networks trained on the same dataset).