Hacker News new | ask | show | jobs
by ggerganov 519 days ago
There are 4 stopping criteria atm:

- Generation time exceeded (configurable in the plugin config)

- Number of tokens exceeded (not the case since you increased it)

- Indentation - stops generating if the next line has shorter indent than the first line

- Small probability of the sampled token

Most likely you are hitting the last criteria. It's something that should be improved in some way, but I am not very sure how. Currently, it is using a very basic token sampling strategy with a custom threshold logic to stop generating when the token probability is too low. Likely this logic is too conservative.

1 comments

Hmm, interesting.

I didn't catch T_max_predict_ms and upped that to 5000ms for fun. Doesn't seem to make a difference, so I'm guessing you are right.