| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by reactordev 36 days ago
	This is why local AI is so important

8 comments

bayindirh 36 days ago

It's already being trained on "public" (ethical or otherwise) data. So, it already has ingested that kind of "optimization" during pre-training and training.

I don't think you can fine-tune your way out of it.

link

ToucanLoucan 36 days ago

People still think these things are smart. That if their word generator eats enough of the Internet, it will somehow give them the real information that's otherwise hidden. Or perhaps a better word; filter the bullshit.

To filter bullshit it would first have to understand bullshit, and it doesn't. That's why an LLM will tell you the solution to a problem that doesn't work, and argue with you when you correct it.

link

bayindirh 36 days ago

This is what bothers me a lot. For the people who doesn't know how it's made or want to believe, it's a miracle.

For me, it's a resource wasting text generator. I'll not lie, I don't use OpenAI, Mistral or Anthropic's models, even for coding. I prefer to read my API docs and cry once.

I used Gemini, five or six times in total. Twice I asked a couple of very specific things, and it unearthed them. Since they were not products, but information, that was helpful. Twice, it has given wrong information. When I "told" it, there was another way, it said "of course there are two ways", etc. Tasteless and time wasting.

I don't like using an LLM all day long, or offload my thinking to them. It's the ultimate self-poisoning incident.

And as you say, these algorithms can't know right/wrong/logical/bullshit, etc. They just spew out text.

link

latexr 36 days ago

Something I’ve also seen multiple times is an LLM giving wrong information, I tell it it’s not right, then it tells me I’m “absolutely right” and it provides the exact same answer and tells me that one will work.

link

reactordev 34 days ago

Oh Gemini, how no one uses you enough…

link

satvikpendem 36 days ago

I was just reading another post yesterday and your comment reminds me of this one [0], same sort of format and experience of the submitted article of the HN post that comment is on.

[0] https://news.ycombinator.com/item?id=48211730

link

reactordev 35 days ago

Sadly critical thinking skills have atrophied steeply in the last decade.

link

fsflover 36 days ago

This is far from widespread at the moment, so it'll be possible to at least use the current cutting-edge models locally in the future.

link

bayindirh 36 days ago

Far from widespread? SEO has seeped to all crevices of the internet for the last 20 years.

link

fsflover 36 days ago

By this measure, any information you can get whatsoever is biased and there is no reason to trust anything at all.

link

satvikpendem 36 days ago

All information has some sort of bias, as no information can truly be unbiased. There is no reason to trust any specific piece of information but taken in aggregate one can disambiguate the biases.

link

fsflover 35 days ago

So we are in agreement.

link

latexr 36 days ago

The major difference is that right now when you land on a page you can do your due diligence and decide if you trust the source. You can still be tricked, but it’s harder and you can get better at the detection.

With LLMs, everything is given the same importance so you have no idea if the data came from a reputable source or an obvious SEO junk website.

link

fsflover 36 days ago

AI can also provide the sources. And if you need to be certain, you should ask for that.

link

rplnt 36 days ago

That doesn't solve this particular problem. Your local model was trained on reddit comments written by bots.

link

soloto 36 days ago

Local AI will have the bias that existed at the time of its training, which is different from no bias. For stuff that needs to be current, a local LLM would need to search the net regardless.

link

embedding-shape 36 days ago

And since "no bias" isn't something that actually exists in reality when it comes to language or even anything near humans, "bias in local model I can introspect" will always be miles ahead of "bias I know is there, but cannot introspect".

link

soloto 35 days ago

Agreed in full.

link

Schweigerose 36 days ago

How do you make sure that the model you run locally is not tainted? Is there even a way to confirm this without providing the complete training set?

link

psb5 36 days ago

Fwiw I just run kiwix/zeal locally which has old school search index of all articles in wiki/stackoverflow etc. That seems enough for most of my day to day use.

link

jondea 36 days ago

It's less compromised, but it's still basing the answer on compromised queries. This is why I pay for independent reviews (e.g Which) where their incentives are more aligned with yours.

link

FergusArgyll 36 days ago

How does that help if it's using search? You get whatever the search engine outputs

link

rdtsc 36 days ago

Not if the models come from Google. The ads will be implicit in the model. X is better that Y an Z would be easy to add to a the training set.

link

pautasso 36 days ago

Does this mean the model must be retrained every time a new ad is posted? How much are AI ads going to cost?

link

rdtsc 36 days ago

Yeah, I meant not individual ads but implicit forced/influenced preference for certain brands. Let’s say it always picks Coke vs Pepsi when giving an example of a soft drink. Or picks BMW when asked to pick the best car. Which cloud provider is the best? -Why, GCP of course, etc.

Companies then get to bid for a preference “place”. This is more like Google paying to be the search engine default in Firefox.

link

weird-eye-issue 36 days ago

Local AI models pull in search results just like ChatGPT does ...

And they are trained on web data just like any other model...

link