Hacker News new | ask | show | jobs
by devmor 779 days ago
It's not very good at giving the proper credence to version numbers.

Granted I started with a hard one, but I asked it how to create a GTK3 interface with PHP, and it gave me instructions to download and use an abandoned project for GTK2, but described it as GTK3 in the steps.

I tried asking it some other questions about languages and applications specific to version numbers - it seems to provide incredibly ambiguous and version agnostic responses, or tells me essentially "you may or may not be able to do this, and you should check if you can" when the answer is clearly that it is not possible. Or it just ignores the version entirely and provides instructions that don't match up - hallucinating UI elements or commands that don't (or didn't yet) exist.

For something targeted at developers, this is a gaping hole and is what I would consider a major oversight - the responses I'm getting are very similar in content to what I get from GPT and Ollama's generic models.

3 comments

That's kind of an interesting issue, I wonder if different tokenization would help. Like maybe putting a space between GTK and the number would put them in separate tokens and give better output.

More generally, do text AI's not support weighting terms like the image AI's do? Over in Stable Diffusion that sounds like something where I'd add a weight like "How do I create a <GTK3:1.2> interface in <PHP:1.1>?"

It is quite possible that the lack of actual intelligence in the LLM is the obstacle in this context.

I also just queried something with "perplexing" results in fact, but I tried the "generic" "knowledge" instead of the "specific" about coding: in the reply the engine included good pointers, but clearly without knowing why they were especially relevant - relevance which instead appeared in the linked references.

It is an LLM+RAG based search engine: the value is only partly in the summary, which could even be misleading - as expected from lack of actual intelligence -, the value is in the linked resources.

In other words, it "understands" your query better that a search engine of the past - and that is valuable. But for the actual solution you are querying for, the "summary" part could be good or could be defective: it is probably best to consult the linked material... Material that you could have not found immediately otherwise - it could have been tricky with past technology to express your need in a way that makes you obtain good search results.

Interesting take! At face value, I would say that if this is the intended usage proposition, the summary actually adds negative value and should not exist.

Or perhaps a more brief summary for each result explaining the relation?

Have you tried Agent Mode? It offers greater intelligence and accuracy compared to Fast Mode.

P.S. Agent Mode is a superior option to Fast Mode. It meticulously examines your questions and assigns an appropriate agent to provide answers, leveraging GPT-4 technology in its operations.