I've always interpreted the definition of storage as arbitrarily large, not specifically infinite. The universe, after all, is finite. The "well, acshually" arguments aren't interesting, because they're 100% abstract.
It is defined as arbitrarily large but not infinite. That's not because of physical concerns, but because some of the theorems don't work if the memory is actually infinite.
You're comparing an a priori concept with a posteriori one. It's like claiming the number five doesn't "acshually" exist. Like yea, it's a concept, concepts don't exist.
A universe isn't a turing machine because it can't run all the programs that can run on a turing machine. This isn't exactly controversial.
What's the difference between arbitrary large and infinite? Would you say the number of possible Turing computable functions is merely arbitrary large and not actually infinite?
When you're talking about something like neural networks on a 4004, the "well ackshually" argument does become very much relevant. The limitations of that kind of platform are hard enough that they do not approximate a Turing machine with respect to modern software.
Llama takes a lot more MIPS and a lot more RAM than linux. Linux is more complicated, but computers were running linux 30 years ago. In this case, quantity has a quality all of its own.
Ignoring things like AVX512 you're looking at about 100x more compute to do something serious with LLaMA.
However! If you just want to demo it working, then you could generate 4 tokens using TinyLLaMA 1.1B which takes 25,164,386,466 cycles. That's about the same cost as booting Linux. So you could do TinyLLaMA if you can do Linux.
Note also that the 4004 lacks a floating-point unit of any kind - not just a vector unit. I think people make 8-bit integer quantizations of LLMs, though, which would be the fastest versions to run on a 4004.
What? A Turing machine can literally be built. The only problem is supplying the infinite tape, or splicing on more when it runs to the end of a finite tape.
Machines that address storage using fixed width pointers (and have no other kind of storage) cannot be Turing machines.
You don't want your telephone exchange to halt. That's why the engineering marvel Erlang. "Engineering" being the key word. The Halting problem isn't a real problem. Just pull the power cord.
You want every call handled in finite time. It's not that there is an unending computation in a telephone exchange, but a series of finite tasks. You can call it corecursion, coinduction or something else co-.
A computer with external connectivity to a tape machine capable of reading and writing a symbols and moving along the tape would qualify as a Turing machine, provided someone feeds it enough tape for any problem that's thrown at it.
It was funny when Sun proudly and unilaterally proclaimed that Sun put the "dot" into "dot com", leaving it wide open for Microsoft to slyly counter that oh yeah, well Microsoft put the "COM" into "dot com" -- i.e. ActiveX, IE, MSJVM, IIS, OLE, Visual Basic, Excel, Word, etc!
And then IBM mocked "When they put the dot into dot-com, they forgot how they were going to connect the dots," after sassily rolling out Eclipse just to cast a dark shadow on Java. Badoom psssh!