| My bad, I directly linked to the C file instead of the project here: It is a program that given a model file, tokenizer file and a prompt, it continues to generate text. To get it to work, you need to clone and build this: https://github.com/trholding/llama2.c So the steps are like this: First you'll need to obtain approval from Meta to download llama3 models on hugging face. Go to https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct, fill the form and then go to https://huggingface.co/settings/gated-repos see acceptance status. Once accepted, do the following to download model, export and run. huggingface-cli download meta-llama/Meta-Llama-3.1-8B-Instruct --include "original/*" --local-dir Meta-Llama-3.1-8B-Instruct git clone https://github.com/trholding/llama2.c.git cd llama2.c/ # Export Quantized 8bit python3 export.py ../llama3.1_8b_instruct_q8.bin --version 2 --meta-llama ../Meta-Llama-3.1-8B-Instruct/original/ # Fastest Quantized Inference build make runq_cc_openmp # Test Llama 3.1 inference, it should generate sensible text ./run ../llama3.1_8b_instruct_q8.bin -z tokenizer_l3.bin -l 3 -i " My cat" |