Hacker News new | ask | show | jobs
by fatso784 1162 days ago
Looks a bit like snakeoil to me. A lot of companies now spinning up simple demos with opaque backends, making huge claims they’ve solved X hard problem for/with AI, then saying “trust us” and “join our waitlist” without hard details or facts to show for it. If you could detect hallucinations/biases etc that easily, don’t you think OpenAI would’ve worked on something like this?
6 comments

> don’t you think OpenAI would’ve worked on something like this?

Along this line of thought: was it a massive oversight for them to not train the model to say "math detected, let me pass that to a solver" instead of trying to guess what token should come next in a math problem?

There's a million categories of problem you could ask an LLM to try to solve. You'd need a million solvers…
This seems like a pretty good thing. The model’s ability to detect _which_ solver to use is the killer feature.
Why is that a killer feature? Humans are quite good at asking different people different questions. If I need to do a simple math problem I'll just prefix "calculate" and pop it into Google, whereas if I want an intro to a named thing I'll prefix "wikipedia". That's not hard.

GPT is quite useful, but not because it solves the problem of "I don't know where the question I have is answerable by a calculator"

you mean huggingGPT?
If you used some sort of plugin system, you could just make a solver for your specific task and drop it in. Doesn't ChatGPT Plus do this now?
It's behind a waitlist.
OpenAI plugins can connect with a growing number of things. Zapier is one, which is already several thousand functions.
Can you give some example recipes?
Most obvious one is Wolfram, it passes most math to Wolfram
They are solving that with plugins now.
I think part of the problem is that it's technically correct to say "my product does X" even if it does X extremely poorly. I'm not sure if this can be changed because any line for "does-X vs does-not-adequately-do-X" is going to necessarily be subjective.

So personally I think the problem is that people see "this product does X" and interpret that to mean that it does X well. I don't think it's necessarily bad that we're seeing an explosion of AI tools that are a bit underwhelming if people understood it as such -- we're on, after all, a site with a heavy startup focus and saying "your product doesn't do everything that I want" is a bit antithetical to that.

But yeah specifically for this one there are arguments that "X is not even possible, especially not with this approach" so it's a bit more egregious.

This isn't new it's just more obvious with this tech. Every sales team at nearly every company has been performing this dance for like hundreds of years.
AI is the new crypto (though with more substance). Attracts many of the same self obsessed snake oil selling characters though.
True. I imagine many crypto startups have desperately pivoted to AI with their last gasp of cash given the recent blowup.
In this area, if there's not a public demo and the results aren't verifiable, then it's not worth paying attention to.
It's good to have third parties (apart from Open AI) that assess the quality of Open AI results. It's the way audits work, it has to be independent... Also, third parties are essential to compare the results from ChatGPT with the results of other LLMs. These are important checks to assess the robustness of OpenAI results!
I can't help but notice your accounts only activity before this post was praising another giskard.ai submission a few months ago. Anything you'd like to disclose?
You should assume everything posted on the internet has an ulterior motive. Relying on disclosures simply allows actual bad actors to avoid scrutiny.

(And no one cares that you used to work at Microsoft or whatever).

Well said.
He didn't say it's not important. He is just pointing out that black-box third party verification is not worth much when you can't independently verify the verifiers.
Definitely agree that black boxes are the problem & that one needs to be able to verify the verifiers - FYI that's why Giskard is open-source and that we build in the open. https://www.giskard.ai/knowledge/giskard-log-1-going-open-so...
The OPs point is that it’s likely impossible to do what is claimed here in general. Imagine the LLM says something like Fermat’s Last Theorem. To verify it, you’d have to either 1) have a proof assistant powerful enough to construct a proof 2) use a second ML model to guess truthfulness. The former is technically challenging and the latter is another model, with its own biases and factual inconsistencies.