Hacker News new | ask | show | jobs
by mhitza 310 days ago
I've been using lightly gpt-oss-20b but what I've found is that for smaller (single sentence) prompts it was easy enough to have it loop infinitely. Since I'm running it with llama.cpp I've set a small repetition penalty and haven't encountered those issues since (I'm using it a couple of times a day to analyze diffs, so I might have just gotten lucky since)
2 comments

I had the same issue with other models where they would loop repeating the same character, sentence or paragraph indefinitely. Turns out the context size some tools set by default is 2k and this is way too small.
I’ve been using the ollama version (uses about 13 Gb RAM on macOS) and haven’t had that issue yet. I wonder if that’s maybe an issue of the llama.cpp port?
Never used ollama, only ready to go models via llamafile and llama.cpp.

Maybe ollama has some defaults it applies to models? I start testing models at 0 temp and tweak from there depending how they behave.