Hacker News new | ask | show | jobs
by Anunayj 641 days ago
I recently experimented with running llama-3.1-8b-instruct locally on my Consumer hardware, aka my Nvidia RTX 4060 with 8GB VRAM, as I wanted to experiment with prompting pdfs with a large context which is extremely expensive with how LLMs are priced.

I was able to fit the model with decent speeds (30 tokens/seconds) and a 20k token context completely on the GPU.

For summarization, the performance of these models are decent enough. However unfortunately in my use case I felt using Gemini's Free Tier with it's multimodal capabilities and much better quality output made running local LLMs not really worth it as of right now, atleast for consumers.

1 comments

you moved the goalposts when you add 'multimodal' there; another item is, no one reads PDF tables and illustrations perfectly, at any price AFAIK
Supposedly submitting screenshots of pdfs (at a large enough zoom per tile/page) to OpenAI gtp4o or Google’s whatever is currently the best way of handling charts and tables.