|
|
|
|
|
by jamal-kumar
456 days ago
|
|
Oh boy. One of the things I have been doing to test if an LLM is 'there yet' by my standards is if it can spit out solutions in elixir without hallucinating packages that don't exist to get the job done. Since I haven't tested since chatgpt 4 came out I dunno where it's at for that but always felt like a decent litmus test. |
|
I've been playing with Claude 3.7 thinking and, perhaps unsurprisingly, I find it overthinks the problem or tries to do far more than I really want it to for any prompt. I expect that I'm just using the wrong tool, and probably should just use Claude 3.7.
Of course in all of this, I'm using the LLM in a "junior" capacity and I'm not giving it giant multi-faceted problems to solve: I'm giving it relatively narrow problems to solve at any one time and am guiding it through that process.