|
|
|
|
|
by Anunayj
641 days ago
|
|
I recently experimented with running llama-3.1-8b-instruct locally on my Consumer hardware, aka my Nvidia RTX 4060 with 8GB VRAM, as I wanted to experiment with prompting pdfs with a large context which is extremely expensive with how LLMs are priced. I was able to fit the model with decent speeds (30 tokens/seconds) and a 20k token context completely on the GPU. For summarization, the performance of these models are decent enough. However unfortunately in my use case I felt using Gemini's Free Tier with it's multimodal capabilities and much better quality output made running local LLMs not really worth it as of right now, atleast for consumers. |
|