Hacker News new | ask | show | jobs
by sroussey 800 days ago
If someone were to create something new, a blank slate approach, what would you find valuable and why?
2 comments

This is a great question!

I think we now know, collectively, a lot more about what’s annoying/hard about building LLM features than we did when LangChain was being furiously developed.

And some things we thought would be important and not-easy, turned out to be very easy: like getting GPT to give back well-formed JSON.

So I think there’s lots of room.

One thing LangChain is doing now that solves something that IS very hard/annoying is testing. I spent 30 minutes yesterday re-running a slow prompt because 1 in 5 runs would produce weird output. Each tweak to the prompt, I had to run at least 10 times to be reasonably sure it was an improvement.

It can be faster and more effective to fallback to a smaller model (gpt3.5 or haiku), the weakness of the prompt will be more obvious on a smaller model and your iteration time will be faster
great insight!
How would testing work out ideally?
Use a local model. For most tasks they are good enough. Let's say Mistral 0.2 instruct is quite solid by now.
Do different versions react to prompts in the same way? I imagined the prompt would be tailored to the quirks of a particular version rather than naturally being stably optimal across versions.
I suppose that is one of the benefits of using a local model, that it reduces model risk. I.e., given a certain prompt, it should always reply in the same way. Using a hosted model, operationally you don't have that control over model risk.
What are the best local/open models for accurate tool-calling?