There's an enormous difference in model complexity between key phrase detection and (general) automatic speech recognition + natural language processing. There's a reason nearly everything Google Assistant does except "OK Google" is done on Google's hardware.