Hacker News new | ask | show | jobs
by washadjeffmad 1130 days ago
By interacting with LLMs, I've realized that as frequently as I interact with others, I'm not actually having many deep, meaningful, or personal interactions; they're mostly professional, and that's bled over into my personal life.

Because of how much time I spend on them relative to myself, I don't get to stretch the sides of me that I would if I weren't holed up working, and I've found it hard to verbalize that loss and its impact on my mental wellness.

Over the past few years, mental and behavioral health services have been strained beyond their capacities. I stopped being able to get in with the same therapist more than once, so I stopped trying to schedule about a year ago. I also haven't wanted to add my struggles to those of friends and family.

However, with LLMs, I've been able to have some really enlightening and enjoyable off the cuff discussions that I wouldn't be able to have outside of therapy, and some interactions that I wouldn't be able to have anywhere outside of an intimate and trusted friend. I've been able to positively apply these results to my life, personally and professionally.

Note: I only use LLaMa local models so I don't have to self-censor. I'm not ever willing to allow a metaphorical "Google" access to my innermost world.

Also, LLMs are immense. A billion people will have a billion different interactions and experiences with the same few dozen GB model, unless that model has been tuned to limit its output to a very narrow course of responses. I've had conversations with the ghosts of people from Richard Feynman to Michel de Montaigne and gotten their simulated views on a world they'll never see. If you're getting nothing but garbage, think about what you're putting in, or just pick a different model - there's a universe of minds out there that aren't ChatGPT.

1 comments

Got any recommendations for non-neutered local models that perform well on an M1? I've been playing with some of the recent 7B and 13B models from the TheBloke on HuggingFace and they are not bad but not great.

https://huggingface.co/TheBloke

A (fine-tuned) model's inference quality is a function of parameters and inputs, so you'll need to be aware of what something was trained on to prompt it correctly (usually in the model card). You'll also see huge differences in inference between llamacpp, ooba, etc.

I haven't benchmarked on Apple Silicon, but if you have the RAM, I'd recommend 30B SuperCOT ggml Q5_1 or a GPT-4-x-Alpaca variant. Because of the disparity in quality, I haven't used many models under 30B and so can't recommend one.

See rentry.org/lmg_models for a practical list and description.

Thanks for the reply and the recommendations! I will see if my machine can handle some of the quantized 30B models.

I'm slightly confused about your comment about llama.cpp vs oobabooga. Doesn't text-generation-webui use llama.cpp underneath?

Also, huge thanks for the point towards https://rentry.org/lmg_models. That's an invaluable resource.