Hacker News new | ask | show | jobs
by vessenes 779 days ago
This is interesting. Groq (chip co, not twitter’s ‘Grok’ LLM) has a similar silicon scale, I’m not sure about architecture, though. One very interesting thing about Groq that I failed to appreciate when they were originally raising is that the architecture is deterministic.

Why is determinism good for inference? If you are clever, you can run computations distributed without waiting for sync. I can’t tell from their marketing materials, but it’s also possible they went for the gold ring and built something latch-free on the silicon side.

Groq seems to have been able to use their architecture to deliver some insanely high token/s numbers; groqchat is by far the fastest inference API I’ve seen.

All this to say that I’m curious what a Dojo architecture designed around training could do. Presuming training was a key use case in the arch design. Knowing the long game thinking at Tesla, I imagine it was.

1 comments

It's a great era to be a hardware nerd.

TSMC got so far out in front of everyone that their competitors had to get creative and solve other issues.

Why is this on 7mn? Because I dont think you could do this on 3nm. It is my understand that everything down at that scale is double shot/imaged to get the right sized components, and with that a higher defect rate.

Look at what intel is doing, and holding out for single shot processes. Their pushing of double sided chips (power on one side and data on the other) would be impossible with the 3nm double shot (I cant see flipping the die as being a good way to get reliability in alignment for 4 imagings..)

I suspect that were getting to the end of size (shrink) scaling and were going to get into process and design scaling. Going to be interesting to see what happens to cost and capacity if we're at that point. Process flexibility would be the new king!