| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brucethemoose2 1102 days ago

Prompt ingestion is too slow on the Oracle VMs.

Also its really tricky to even build llama.cpp with a BLAS library, to make prompt ingestion less slow. The Oracle Linux OpenBLAS build isnt detected ootb, and it doesn't perform well compared to x86 for some reason.

LLVM/GCC have some kind of issue identifying the Ampere ARM architecture (march=native doesn't really work), so maybe this could be improved with the right compiler flags?

2 comments

pedrovhb 1102 days ago

Not sure if that's still the case. I remember having trouble building it a couple of months ago, had to tweak the Makefile because iirc it assumed ARM64 <=> Mac, but I recently re-cloned the repo and started from scratch and it was as simple as `make DLLAMA_BLAS=1`. I don't think I have any special setup other than having installed the apt openblas dev package.

link

brucethemoose2 1102 days ago

IDK. A bunch of basic development packages like git were missing from my Ubuntu image when I tried last week, and I just gave up because it seemed like a big rabbit hole to go down.

I can see the ARM64 versions on the Ubuntu web package list, so... IDK what was going on?

On Oracle Linux, until I changed some env variables and lines in the makefile, the openblas build would "work," but it was actually silently failing and not using OpenBLAS.

link

jvickers 1102 days ago

Is it any easier when using Ubuntu on ARM Oracle servers?

link

brucethemoose2 1102 days ago

Nah, I tried Ubuntu too.

The OpenBLAS package was missing on ARM, along with some other dependencies I needed for compilation.

At the end of the day, even with many tweaks and custom compilation flags, the instance was averaging below 1 token/sec as a Kobold Horde host, which is below the threshold to even be allowed as a llm host.

link