I've always interpreted the definition of storage as arbitrarily large, not specifically infinite. The universe, after all, is finite. The "well, acshually" arguments aren't interesting, because they're 100% abstract.
It is defined as arbitrarily large but not infinite. That's not because of physical concerns, but because some of the theorems don't work if the memory is actually infinite.
You're comparing an a priori concept with a posteriori one. It's like claiming the number five doesn't "acshually" exist. Like yea, it's a concept, concepts don't exist.
A universe isn't a turing machine because it can't run all the programs that can run on a turing machine. This isn't exactly controversial.
What's the difference between arbitrary large and infinite? Would you say the number of possible Turing computable functions is merely arbitrary large and not actually infinite?
When you're talking about something like neural networks on a 4004, the "well ackshually" argument does become very much relevant. The limitations of that kind of platform are hard enough that they do not approximate a Turing machine with respect to modern software.
Llama takes a lot more MIPS and a lot more RAM than linux. Linux is more complicated, but computers were running linux 30 years ago. In this case, quantity has a quality all of its own.
Ignoring things like AVX512 you're looking at about 100x more compute to do something serious with LLaMA.
However! If you just want to demo it working, then you could generate 4 tokens using TinyLLaMA 1.1B which takes 25,164,386,466 cycles. That's about the same cost as booting Linux. So you could do TinyLLaMA if you can do Linux.
Note also that the 4004 lacks a floating-point unit of any kind - not just a vector unit. I think people make 8-bit integer quantizations of LLMs, though, which would be the fastest versions to run on a 4004.
A lot of quants just upcast to floats. Some of them work on integer multiplication using pmaddubsw. But oof, it looks like the i4004 doesn't even have that.