| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fatso784 1162 days ago
	Looks a bit like snakeoil to me. A lot of companies now spinning up simple demos with opaque backends, making huge claims they’ve solved X hard problem for/with AI, then saying “trust us” and “join our waitlist” without hard details or facts to show for it. If you could detect hallucinations/biases etc that easily, don’t you think OpenAI would’ve worked on something like this?

6 comments

MuffinFlavored 1162 days ago

> don’t you think OpenAI would’ve worked on something like this?

Along this line of thought: was it a massive oversight for them to not train the model to say "math detected, let me pass that to a solver" instead of trying to guess what token should come next in a math problem?

link

TazeTSchnitzel 1162 days ago

There's a million categories of problem you could ask an LLM to try to solve. You'd need a million solvers…

link

gurleen_s 1162 days ago

This seems like a pretty good thing. The model’s ability to detect _which_ solver to use is the killer feature.

link

iudqnolq 1162 days ago

Why is that a killer feature? Humans are quite good at asking different people different questions. If I need to do a simple math problem I'll just prefix "calculate" and pop it into Google, whereas if I want an intro to a named thing I'll prefix "wikipedia". That's not hard.

GPT is quite useful, but not because it solves the problem of "I don't know where the question I have is answerable by a calculator"

link

underlines 1162 days ago

you mean huggingGPT?

link

wds 1162 days ago

If you used some sort of plugin system, you could just make a solver for your specific task and drop it in. Doesn't ChatGPT Plus do this now?

link

Tostino 1162 days ago

It's behind a waitlist.

link

flangola7 1162 days ago

OpenAI plugins can connect with a growing number of things. Zapier is one, which is already several thousand functions.

link

samstave 1162 days ago

Can you give some example recipes?

link

BoorishBears 1162 days ago

Most obvious one is Wolfram, it passes most math to Wolfram

link

kolinko 1162 days ago

They are solving that with plugins now.

link

kmod 1162 days ago

I think part of the problem is that it's technically correct to say "my product does X" even if it does X extremely poorly. I'm not sure if this can be changed because any line for "does-X vs does-not-adequately-do-X" is going to necessarily be subjective.

So personally I think the problem is that people see "this product does X" and interpret that to mean that it does X well. I don't think it's necessarily bad that we're seeing an explosion of AI tools that are a bit underwhelming if people understood it as such -- we're on, after all, a site with a heavy startup focus and saying "your product doesn't do everything that I want" is a bit antithetical to that.

But yeah specifically for this one there are arguments that "X is not even possible, especially not with this approach" so it's a bit more egregious.

link

roflyear 1162 days ago

This isn't new it's just more obvious with this tech. Every sales team at nearly every company has been performing this dance for like hundreds of years.

link

ulfw 1161 days ago

AI is the new crypto (though with more substance). Attracts many of the same self obsessed snake oil selling characters though.

link

cudgy 1160 days ago

True. I imagine many crypto startups have desperately pivoted to AI with their last gasp of cash given the recent blowup.

link

danShumway 1162 days ago

In this area, if there's not a public demo and the results aren't verifiable, then it's not worth paying attention to.

link

henri18 1162 days ago

It's good to have third parties (apart from Open AI) that assess the quality of Open AI results. It's the way audits work, it has to be independent... Also, third parties are essential to compare the results from ChatGPT with the results of other LLMs. These are important checks to assess the robustness of OpenAI results!

link

jsheard 1162 days ago

I can't help but notice your accounts only activity before this post was praising another giskard.ai submission a few months ago. Anything you'd like to disclose?

link

hallucy 1162 days ago

You should assume everything posted on the internet has an ulterior motive. Relying on disclosures simply allows actual bad actors to avoid scrutiny.

(And no one cares that you used to work at Microsoft or whatever).

link

cantaloa 1162 days ago

Well said.

link

amitport 1162 days ago

He didn't say it's not important. He is just pointing out that black-box third party verification is not worth much when you can't independently verify the verifiers.

link

alexcombessie 1162 days ago

Definitely agree that black boxes are the problem & that one needs to be able to verify the verifiers - FYI that's why Giskard is open-source and that we build in the open. https://www.giskard.ai/knowledge/giskard-log-1-going-open-so...

link

mirker 1162 days ago

The OPs point is that it’s likely impossible to do what is claimed here in general. Imagine the LLM says something like Fermat’s Last Theorem. To verify it, you’d have to either 1) have a proof assistant powerful enough to construct a proof 2) use a second ML model to guess truthfulness. The former is technically challenging and the latter is another model, with its own biases and factual inconsistencies.

link