Hacker News new | ask | show | jobs
by merb 33 days ago
Wouldn’t it be good to start investigating into a micro model architecture? Like first model checks the context and routes to the Java optimized model, etc. would make it also simpler to load/unload models in memory.

So extremely small models that are only good for a certain task like programming languages. A little bit of a model at the front that is extremely good in classification of tasks and than a more complex model that can bring each of these micro models back together

2 comments

My guess is that we underestimate how much non-Java data and context in general is needed to create a good Java coding model. It could be true that a good Java model would be of 80-90% the size of a comparable overall coding model.

Obviously, I have no idea but I guess it’s not as simple as “just train only on Java code and reduce size to 1/10th”.

I think you're describing Mixture-of-Experts.