Hacker News new | ask | show | jobs
by doctoboggan 523 days ago
Have you tried asking it to do something useful rather than ask it to solve gotcha word problems?

> I read all of these amazing things that they supposedly all can do.

You seem to be implying people are confused (or lying?) about the things they are able to get LLMs to do.

If you give it an honest effort to solve some real problems you are facing then you may be able to speak with more authority. Often it comes down to prompting skill. Try to read about different prompting approaches as that may help you.

In general, you need to be specific about what you need, and you need to give all relevant details. Like the post author said, treat it like a junior programmer or an intern.

3 comments

> Have you tried asking it to do something useful rather than ask it to solve gotcha word problems?

What you call "gotcha word problem", I'd compare to typical math problems where you need to understand a text, extract the required information, solve the issue, and then present your results. Maybe this is a toy-example, but compared to reading the specs of some Microprocessors, this is rather easy. These AIs seem apparently be able to solve school or even college level math problems. Shouldn't my example be a walk in the park, then? Especially since it's a large LANGUAGE model?

> You seem to be implying people are confused (or lying?) about the things they are able to get LLMs to do.

I am merely stating observations and was hoping for an explanation. What good does it me if I accuse people of lying?

> Often it comes down to prompting skill. Try to read about different prompting approaches as that may help you.

"You are using it wrong" it is, then. So how do I differentiate between a good sounding but wrong answer, whether that came to be due to my apparently lack of prompting skills or else? They all sound equally well, it just starts "being wrong" at some point.

> In general, you need to be specific about what you need, and you need to give all relevant details.

What details should I have added in the given example? The prompt was probably more comprehensive and detailed than if this task was given in primary school.

> Like the post author said, treat it like a junior programmer or an intern.

I would, if it acted like a junior programmer or like an intern. For them, you can usually see if they are unsure or making things up (if they do these things). For an AI I've yet to see something like "hey, I might be wrong about this, but this is my best effort, maybe we can have a look together."

I just copy/pasted your exact logic puzzle into Claude-3.5-sonnet and it solved it right away. Here is the response:

    Let me help you solve this step by step.

    First, let's identify what we know about Samantha:
     Samantha has 2 brothers
     Samantha has 4 sisters

    We also know that:
     Alex is one of Samantha's brothers
     Alex and Samantha share all siblings

    Now, from Alex's perspective:
     Alex is one of Samantha's brothers, so he has 1 other brother (since Samantha has 2 brothers total)
     Alex has Samantha as a sister, plus her other 4 sisters
     So Alex has 5 sisters total (Samantha + her 4 sisters)
     And Alex has 1 brother (the other brother besides himself)
 
    Therefore, Alex has:
     1 brother
     5 sisters
It is basic reasoning. How do you expect it to produce useful programs with any consistency without it?
> How do you expect it to produce useful programs

why not ask it to produce said program, and then evaluate it on that output, rather than proxy it via asking a logic puzzle?

It's like doing a coding interview when hiring an employee with these brain teaser puzzles, and when they fail you disqualify them, rather than asking them to do a real task that would be something they'd encounter on the job.

I have. I was feeling lazy one day and used it to write a small python script to graph some data.

The back and forth wasn’t fun and it flat out refused to use seaborn for some reason, but it worked and was fine overall.

I then used aider+claude to help me work with yjs. Led me down a rabbit hole based on an incorrect description of the yjs sync protocol. Took 2 days to untangle everything. Yjs is fairly new though, so I didn’t fault it too much.

I thin tried using it for work to deal with some surprisingly intricate back button logic. Again, incorrect understanding (on both our parts) of the underlying API caused a few days of headache. I would’ve been better off just reading the docs than trying to use an AI assistant.

Using AI actually frustrated me to the point where it convinced me to suck it up and just read the Specs and sources of the tools I’m using. I’ve been doing that for a few months now and just RTFM is better for me than AI assistants have been.

Been on this train for a while, but have had some success getting AI assistants to RTFM for me and tell me what it says, always making them to source where they are getting their info. It's only slightly better than just grepping I suspect. I am also extremely suspicious of people's claims of wild success. It's very easy to do some experimentation with it over a course of a few months and see where the pain points are, and it's exactly what you described - if I'm not a domain expert and I lean into these tools, I can definitely prototype stuff quickly than without, but when you run into these "misunderstandings" all the gains are tossed right out of the window, with the additional frustration on top of that (which limits the ability to troubleshoot). And if I am a domain expert - why would I need these tools at all? They're not likely, at least at this point, going to do much to accelerate you. In the one or two areas I consider myself an expert, they are positively a hindrance, at least in their current state.