Hacker News new | ask | show | jobs
by dhruvdh 749 days ago
What's the point of the 8000 LOC limit? Has anyone worked in a project with a LOC limit? Why was the limit in place?
11 comments

It's just a way to keep the code size in check, make sure it can be read and understood relatively easily. Don't overthink it. I doubt much, if any, research went into picking the limit. The line width is over 120 in many places, and the code inevitably ends up looking like

  cache_key = (device, st, dtype, op, arg, tuple(ref(x) for x in srcs)) if base is None else (st, ref(base))
Seems the code sample contradicts your first statement
I think this might be an example of https://en.m.wikipedia.org/wiki/Goodhart's_law

The line count probably does still act as a limit on complexity overall but perhaps less than hoped for.

Indeed, I was making a point.
This is truly depressing because the aspirations of tinygrad are so appealing in terms of being concise, effective and maintainable. Then, instead, they throw comprehensibility entirely out of the window.
To compare, the PyTorch repo has ~400k lines of C, ~850k lines of C++ and more than 1.5 million lines of Python code.

PyTorch does more than tinygrad, but does it really do 343x more things?

If PyTorch does the 1-2 things you need and Tinygrad doesn't do, then what are you going to use?

The Python source distribution has long maintained the philosophy of “batteries included” – having a rich and versatile standard library which is immediately available, without making the user download separate packages.

https://peps.python.org/pep-0206/

OTOH:

  Simple is better than complex.
  Complex is better than complicated.
https://peps.python.org/pep-0020/
PyTorch of course. Or alternatively a lib or custom code on top of TinyGrad. Is that a problem?
geohot explained on one of this streams, and per my terrible memory: “tiny” is a way of expressing the architecture constraint that the system should not attempt to target [(many hardware architectures and their optimizations) * (many model, training, etc etc variants)] like PyTorch - which requires maintenance of a shit ton of code and a staff/community behind Meta. Instead, tinygrad should provide core abstractions that can be composed to accomplish a similar set of targets but for only one hardware architecture (for now I guess). He is releasing a companion hardware item which would fund the development I believe.
I think you massively underestimate the complexity of pytorch. Even if we exclude all GPUs except for AMD, and exclude clang (required for AOT engine), pytorch depends on almost every ROCm library. And inside it depends on original Triton library, and on forked Triton, and on aotriton, which depends on forked MLIR (because AMD MLIR don't contribute these changes to upstream), which depends on another forked LLVM/Clang (because LLVM api is not stable enough for them, I guess). And then there is MIOpen/rocBLAS/hipBLASlt/hipSOLVER/rocFFT/etc - libraries with gigabytes (!) of autogenerated code. Additionally, there are dozens of smaller linked libraries like oneDNN, LIBXSMM, magma, numpy, openBLAS, all needed for running "things". So even without autogenerated code, consider multiplying 1.5 million LOC to 100.
Probably.
Easily
uh, ya? lol
Right now there doesn't seem to be much point. IIRC they had a 1000 LOC limit on the core part of the code when the project was early.

The README no longer mentions the limit and it looks like they just raise it whenever needed. Three months ago it was bumped to 6500 LOC. One month ago it was bumped to 8000 lines.

A tech debt ceiling so to speak then. There might be some use to it. It's still inevitably increased, but only after debate, discussion, and a lot of time in-between really considering the form and impact of the code being entered to fit within the constraint
To keep it "tiny". (IIRC geohot started it because he thought pytorch and others were bloated and a simple ml framework would be inherently better)
It used to be a 1,000. I guess it’s just a reminder to be succinct.
Looking at the code base right now, apparently to produce some of the most unreadable code possible (https://github.com/tinygrad/tinygrad/blob/master/tinygrad/re...)

LOC limits have to be one of the worst incentives you can give programmers.

The only one I can think of the dwm window manager (https://dwm.suckless.org/), that used to prominently mention a SLOC limit of 2000. Doesn't seem to be mentioned in the landing page anymore, not sure if it's still in effect.
There are benefits of having a low number of lines of codes, e.g. if you want to print out on a paper (and reduce the number of pages), or store on a disk with a limited storage (although number of bytes is a more useful measure, then), or if you want to read it to understand it in less time than a longer program, etc. Of course, the limit of number of characters on each line, is also necessary, then.

However, that doesn't solve everything. Many things it does not accurately measure, e.g. complexity, number of stuff in one line, program speed, memory usage, etc. Those are other things to measure, and it can be helpful to reduce memory usage etc, but that is not the number of lines of codes.

Cyclomatic complexity would be a better measurement.
To stay Tiny
No new features.