| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mdritch 107 days ago

I like the idea. About question 2, I think you need some way to publicly benchmark your stripped-down models' performance. Your models probably won't be able to perform on the standard benchmarks, but the big models will probably be able to work on your custom eval sets, such as those Hindi math problems.

I would publish: 1) your domain specific eval set 2) your model's results on that eval set 3) biglab's model's results on that eval set

That would give users a way to determine if your model is actually capable in that reduced domain