How did you get to 64 bytes of state? Last I looked, Go's ChaCha8 implementation had 300 bytes of state. Most of that was spent on a buffer which was necessary for optimal amortized performance.
Fair enough. I was just thinking of base ChaCha state without the buffering. 300B is still significantly better than mt19937 (~2.5kB) or the old Go generator (4.9kB!).