Hacker News new | ask | show | jobs
by bwestergard 1066 days ago
The value is in:

1. Running the typescript type checker against what is returned by the LLM.

2. If there are type errors, combining those into a "repair prompt" that will (it is assumed) have a higher likelihood of eliciting an LLM output that type checks.

3. Gracefully handling the cases where the heuristic in #2 fails.

https://github.com/microsoft/TypeChat/blob/main/src/typechat...

In my experience experimenting with the same basic idea, the heuristic in #2 works surprisingly well for relatively simple types (i.e. records and arrays not nested too deeply, limited use of type variables). It turns out that prompting LLMs to return values inhabiting relatively simple types can be used to create useful applications. Since that is valuable, this library is valuable inasmuch as it eliminates the need to hand roll this request pattern, and provides a standardized integration with the typescript codebase.

2 comments

Here's a project that does that better imo:

https://github.com/dzhng/zod-gpt

And by better I mean doesn't tie you to OpenAI for no good reason

How does TypeChat tie you to OpenAI more than zod-gpt does? The interface required of a chat completion model is as simple as it gets, and you can provide your own easily (as the linked post makes clear)

https://github.com/microsoft/TypeChat/blob/4d34a5005c67bc494...

The ergonomics of most of these AI libraries are built around using whatever models they provide integrations for: according to the file you linked retries won't even work unless you go and roll them in your implementation.

I'm sure someone will open a PR for Anthropic/Cohere/etc. but a quick glance made it pretty clear they made it with OpenAI-first in mind, or even low hanging fruit like retries would have been abstracted away at a higher level.

I don't know where all you people work that your employer would prefer a random git repo (that has no support and no guarantee of updates) over a solution from Microsoft. (Alternatively: that you have so much free time that you'd prefer to fiddle with your own validation code instead of writing your actual app)

Open source solutions are great (which this still is, btw), but having a first-party solution is also a good thing.

You're overrating the influence of the name Microsoft here. It's just some devs from the company working on this with no proper guarantee backing the project.

I've been through this whole song and dance already with Microsoft's Guidance (another LLM project) and could not justify using it further in production at work. We built some tools and wrappers ourselves and it wasn't even that difficult. These libraries are often more trouble than they're worth.

I’m pretty sure Anders, Steve Lucco, and Daniel Rosenwasser worked on this. So inventors + current lead PM of typescript.

Should lend some credibility to the project.

Not really, better to leave the AI stuff to the AI people rather than PL people. When you don't, you get gimmick libraries like this rather than a solution that fits into the ecosystem

These folks have no pedigree when it comes to LLMs or AI, so no it does not lend credibility

I don't know which employer is hiring the people who make logical leaps like this but I thank them for their sacrifice.

At the end of the day the repo I linked is grokkable with about 10 minutes of effort, and has simple demonstrable usefulness by letting you swap out the LLM you're calling.

Both are experimental open source libraries in an experimental space.

Many companies expressly avoid Microsoft products, particularly given its well exposed history of embrace, extend, extinguish.
Look at Guidance - that's being ignored by Microsoft yet it's an official repo
I use Zod a great deal day to day, so this is appealing inasmuch as it would allow me to re-use those definitions.
Anything like this but for Python?
these are trivial steps you can add in any script, as your link demonstrates.

Why would I want to add all this extra stuff just for that? The opaque retry until it returns valid JSON? That sounds like it will make for many pleasant support cases or issues

Personally, I have found investing more effort in the actual prompt engineering improves success rates and reduces the need to retry with an appended error message. Especially helpful are input/output pairs (i.e. few-shot) and while we haven't tried it yet, I imagine fine-tuning and distillation would improve the situation even more

There are many subtleties to invoking the typescript type checker from node. It's nice to have support for that from the team that maintains the type checker.
Admittedly, couldn't they spend some effort on making that invocation less subtle instead?
Is the team working on typescript in a good position to be making LLM libraries, interfaces, and abstractions? Do they have the background and context to understand how their library fits into AI workflows? Could they have provided the same value with a blog post and sample code?
Your coworkers must love you.
Indeed, we all do what we are good at and appreciate each other and no having to do the things they do

But what does your comment have to do with any of this at all?

It's called sarcasm. But here, let me say the same thing directly: you are an insufferable prick. Imagine gatekeeping talking to a fucking chatbot.
agreed. not to mention we're talking about Microsoft here. the same company that gave us "guidance", a defunct LLM framework.
I’ve used guidance, why is it defunct? I found it was powerful at templating, really decent for generating synthetic datasets.