| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thangngoc89 976 days ago
	GGML is the framework for running deep neural network, mostly for interference. It's the same level as Pytorch or Tensorflow. So I would say GGML is the browser in your Javascript/React analogy. llama.cpp is a project that uses GGML the framework under the hood, same authors. Some features were even developed in llama.cpp before being ported to GGML. Ollama provides a user-friendly way to uses llama models. No ideas what it uses under the hood.

2 comments

simonw 976 days ago

The Llama name is pretty confusing at this point.

LLaMA was the model Facebook released under a non-commercial license back in February which was the first really capable openly available model. It drove a huge wave of research, and various projects were named after it (llama.cpp for example).

Llama 2 came out in July and allowed commercial usage.

But... there are increasing number of models now that aren't actually related to Llama at all. Projects like llama.cpp and Ollama can often be used to run those too.

So "Llama" no longer reliably means "related to Facebook's LLaMA architecture".

link

vanillax 976 days ago

- GPTQ: pure gpu inference, used with AutoGPTQ, exllama, exllamav2, offers only 4 bit quantization

what is autoGTPTQ and exllama, what do it mean it only works with AutoGPTQ and exllama? Are those like TensorFlow Frameworks?

link

cptcobalt 976 days ago

Ollama seems to be using a lot of the same, but as a really nice and easy to use wrapper for a lot of glue a lot of us would wind up writing anyway. It's quickly become my personal preference.

It looks to include submodules for GGML and GGUF from llama.cpp

https://github.com/jmorganca/ollama/tree/main/llm

link