|
|
|
|
|
by wokwokwok
679 days ago
|
|
I again, can’t comprehend how this can possibly be ambiguous from my comment, but the second one. We don’t (by all accounts, no one does) have a way to create this kind of dataset at scale, in this kind of complex user contributed content environment (specifically npm and other places like it). |
|
The models are a bit janky in my testing (especially prone to leaking test materials, and highly specialized on a narrow domain), but fantastic for their size.
Intentional "under-generalization" seems like a fairly self-evident approach to making optimal (and economical, on the training side) use of smaller models.
As for whether it works for a general purpose model, my intuition says that it does (i.e. cutting off the "long tail of knowledge" in favour of a better handling of the mainstream, by the limited neurons available).
As for whether that tech exists, I reckon a simple tf-idf would get you 80% of those wins, but that might be ignorance/arrogance on my part.