| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by merb 33 days ago
	Wouldn’t it be good to start investigating into a micro model architecture? Like first model checks the context and routes to the Java optimized model, etc. would make it also simpler to load/unload models in memory. So extremely small models that are only good for a certain task like programming languages. A little bit of a model at the front that is extremely good in classification of tasks and than a more complex model that can bring each of these micro models back together

2 comments

lukeundtrug 33 days ago

My guess is that we underestimate how much non-Java data and context in general is needed to create a good Java coding model. It could be true that a good Java model would be of 80-90% the size of a comparable overall coding model.

Obviously, I have no idea but I guess it’s not as simple as “just train only on Java code and reduce size to 1/10th”.

link

puilp0502 33 days ago

I think you're describing Mixture-of-Experts.

link