Hacker News new | ask | show | jobs
by phi-go 301 days ago
Does this have a compute benefit or could one use different specialized LLM architectures / models for the subnetworks?