Hacker News new | ask | show | jobs
by androiddrew 81 days ago
Could you share what you are using for inference and how you are running it? I have a 64G VRAM/128G system RAM setup.
1 comments

Most people are using something in the llama family for inference. Llama server is my go to. Unsloth guides describe how to configure inference for your model of choice.