| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JediPig 692 days ago

I tested this out on my workload ( SRE/Devops/C#/Golang/C++ ). it started responding about non-sense on a simple write me boto python script that changes x ,y,z value.

Then I tried other questions in my past to compare... However, I believe the engineer who did the LLM, just used the questions in benchmarks.

One instance after a hour of use ( I stopped then ) it answered one question with 4 different programming languages, and answers that was no way related to the question.

2 comments

tmikaeld 692 days ago

I have the same experience, hallucinates and rambles on and on about "solutions" that are not related.

Unfortunately, this has always been my experience with all open source code models that can be self-hosted.

link

Gracana 692 days ago

It sounds like you are trying to chat with the base model when you should be using a chat model.

link

tmikaeld 692 days ago

No, I’m using 9b-chat-q8_0 on a 4090

link

tmikaeld 692 days ago

Turns out that Ollama on windows will run multiple models in parallell consuming all available VRAM and RAM. Changing it to 1 fixed the issue, now it's working great! However, the context length for the output is very small - only 1024 tokens.

link

Gracana 691 days ago

That's some really strange behavior, I don't know why that would cause poor results rather than just poor performance.

Can you configure the context size with `/set parameter num_ctx N`? On my laptop with an RTX A3000 12GB I can run `yi-coder:9b-chat` (Q4_0) with 32768 context and it produces good results quickly. That uses 11GB of VRAM so it's maxed out for this setup.

link

tmikaeld 691 days ago

Solved, see:

https://github.com/01-ai/Yi-Coder/issues/6#issuecomment-2334...

Works very well now! 65K input tokens with 8192 output tokens is no longer an issue on my 4090. (It maxes out on 22GB/VRAM)

link

tarruda 692 days ago

Have you ran the model in full FP16? It is possible a lot of performance is lost when running quantized versions.

link