Hacker News new | ask | show | jobs
Show HN: Willitrun – check if any ML model runs on any device (benchmark-backed) (github.com)
1 points by smoothyy 76 days ago
I kept running into the same problem with local/edge ML: I would read through model cards or start downloading a model, and only later realize it barely didn't fit on my device or would run too slowly to be useful.

So I built willitrun, a small CLI that tries to answer that upfront.

It checks whether a model is likely to fit and run on a given device. When benchmark data exists, it uses that first; otherwise it falls back to a lightweight estimate. Currently covers 482 benchmarks across 88 devices (desktop GPUs, server hardware, Apple Silicon, and NVIDIA Jetson) with HuggingFace model name resolution built in.

Right now the goal is not to be perfect, but to be useful enough to avoid obviously bad choices before spending time downloading or testing models manually. It's also useful for edge devices like a Jetson Orin because you can check performance without physically accessing the hardware.

Most public benchmarks focus on LLMs, but out of personal interest I tried to include other categories as well.

I would be very interested in feedback, especially around cases where the estimates are off or where benchmark coverage is missing.

1 comments

Smart call on the tiered lookup, hitting SQLite first and falling back to FLOPs/TFLOPS estimation. One thing I'm wondering about the 20% overhead in Tier 2, does that factor in framework overhead or just raw model weights? That margin can vary a lot depending on whether you're running PyTorch vs ONNX.
The 20% is a safety margin on the memory fit check only. it sits on top of the raw weights-only figure (params × bytes-per-precision) to account for KV cache and activation tensors, not framework differences specifically. Your point is valid but i think it applies to a different layer. PyTorch vs ONNX overhead is real, but it's implicitly captured in the throughput path. Tier 2 scales from real-world benchmarks that already reflect whatever framework ran them. The 20% is intentionally conservative: it'll occasionally say a model won't fit when it technically could, but it won't tell you something fits and then OOM you.