| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by drillsteps5 481 days ago

By not building tooling in Python.

llamacpp is a high performative open source solution capable of inference of large number of published LL models both in CPU and in GPU. It's written in C++.

It's easy to download and build in Windows or Linux.

It can be used as a command line tool, linked and used as a library from a variety of languages, including Python, or communicated with through a simple REST service which is also part of the same repo. It even has a simple Web frontend (built with React I believe) which allows you to use it for simple conversations (no bells and whistles).

And yet the author is using Ollama which itself is a wrapper around llamacpp (as most of them are) written in Python.

We're creating the problems that need soling.