It's just a way to keep the code size in check, make sure it can be read and understood relatively easily. Don't overthink it. I doubt much, if any, research went into picking the limit. The line width is over 120 in many places, and the code inevitably ends up looking like
cache_key = (device, st, dtype, op, arg, tuple(ref(x) for x in srcs)) if base is None else (st, ref(base))
This is truly depressing because the aspirations of tinygrad are so appealing in terms of being concise, effective and maintainable. Then, instead, they throw comprehensibility entirely out of the window.
If PyTorch does the 1-2 things you need and Tinygrad doesn't do, then what are you going to use?
The Python source distribution has long maintained the philosophy of “batteries included” – having a rich and versatile standard library which is immediately available, without making the user download separate packages.
geohot explained on one of this streams, and per my terrible memory: “tiny” is a way of expressing the architecture constraint that the system should not attempt to target [(many hardware architectures and their optimizations) * (many model, training, etc etc variants)] like PyTorch - which requires maintenance of a shit ton of code and a staff/community behind Meta. Instead, tinygrad should provide core abstractions that can be composed to accomplish a similar set of targets but for only one hardware architecture (for now I guess). He is releasing a companion hardware item which would fund the development I believe.
I think you massively underestimate the complexity of pytorch. Even if we exclude all GPUs except for AMD, and exclude clang (required for AOT engine), pytorch depends on almost every ROCm library. And inside it depends on original Triton library, and on forked Triton, and on aotriton, which depends on forked MLIR (because AMD MLIR don't contribute these changes to upstream), which depends on another forked LLVM/Clang (because LLVM api is not stable enough for them, I guess). And then there is MIOpen/rocBLAS/hipBLASlt/hipSOLVER/rocFFT/etc - libraries with gigabytes (!) of autogenerated code. Additionally, there are dozens of smaller linked libraries like oneDNN, LIBXSMM, magma, numpy, openBLAS, all needed for running "things". So even without autogenerated code, consider multiplying 1.5 million LOC to 100.
Right now there doesn't seem to be much point. IIRC they had a 1000 LOC limit on the core part of the code when the project was early.
The README no longer mentions the limit and it looks like they just raise it whenever needed. Three months ago it was bumped to 6500 LOC. One month ago it was bumped to 8000 lines.
A tech debt ceiling so to speak then. There might be some use to it. It's still inevitably increased, but only after debate, discussion, and a lot of time in-between really considering the form and impact of the code being entered to fit within the constraint
The only one I can think of the dwm window manager (https://dwm.suckless.org/), that used to prominently mention a SLOC limit of 2000. Doesn't seem to be mentioned in the landing page anymore, not sure if it's still in effect.
There are benefits of having a low number of lines of codes, e.g. if you want to print out on a paper (and reduce the number of pages), or store on a disk with a limited storage (although number of bytes is a more useful measure, then), or if you want to read it to understand it in less time than a longer program, etc. Of course, the limit of number of characters on each line, is also necessary, then.
However, that doesn't solve everything. Many things it does not accurately measure, e.g. complexity, number of stuff in one line, program speed, memory usage, etc. Those are other things to measure, and it can be helpful to reduce memory usage etc, but that is not the number of lines of codes.