Hacker News new | ask | show | jobs
by mentos 781 days ago
This is awesome. I have been using ChatGPT4 for almost a year and haven't really experimented with locally running LLMs because I assumed that the processing time would take too long per token. This demo has shown me that my RTX 2080 running Llama 3 can compete with ChatGPT4 for a lot of my prompts.

This has sparked a curiosity in me to play with more LLms locally, thank you!

3 comments

My pixel 6 was able to run tinyllama and answer questions with alarming accuracy. I'm honestly blown away.
This is amazing. Thanks both for sharing your stories. Made my day.
Uh oh, I had that same moment a bit over a year ago with MLC's old WebLLM. Take a deep breath before you jump into this rabbit hole because once you're in there's no escape :)

New models just keep rolling in day after day on r/locallama, tunes for this or that, new prompt formats, new quantization types, people doing all kinds of tests and analyses, new arxiv papers on some breakthrough and llama.cpp implementing it 3 days later. Every few weeks a new base model drops from somebody. So many things to try that nobody has tried before. It's genuinely like crack.