Hacker News new | ask | show | jobs
by verdverm 1068 days ago
I don't see the value add here.

Here's the core of the message sent to the LLM: https://github.com/microsoft/TypeChat/blob/main/src/typechat...

You are basically getting a fixed prompt to return structured data with a small amount of automation and vendor lockin. All these LLM libraries are just crappy APIs to the underlying API. It is trivial to write a script that does the same and will be much more flexible as models and user needs evolve.

As an example, think about how you could change the prompt or use python classes instead. How much work would this be using a library like this versus something that lifts the API calls and text templating to the user like: https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/llm...

8 comments

The value is in:

1. Running the typescript type checker against what is returned by the LLM.

2. If there are type errors, combining those into a "repair prompt" that will (it is assumed) have a higher likelihood of eliciting an LLM output that type checks.

3. Gracefully handling the cases where the heuristic in #2 fails.

https://github.com/microsoft/TypeChat/blob/main/src/typechat...

In my experience experimenting with the same basic idea, the heuristic in #2 works surprisingly well for relatively simple types (i.e. records and arrays not nested too deeply, limited use of type variables). It turns out that prompting LLMs to return values inhabiting relatively simple types can be used to create useful applications. Since that is valuable, this library is valuable inasmuch as it eliminates the need to hand roll this request pattern, and provides a standardized integration with the typescript codebase.

Here's a project that does that better imo:

https://github.com/dzhng/zod-gpt

And by better I mean doesn't tie you to OpenAI for no good reason

How does TypeChat tie you to OpenAI more than zod-gpt does? The interface required of a chat completion model is as simple as it gets, and you can provide your own easily (as the linked post makes clear)

https://github.com/microsoft/TypeChat/blob/4d34a5005c67bc494...

The ergonomics of most of these AI libraries are built around using whatever models they provide integrations for: according to the file you linked retries won't even work unless you go and roll them in your implementation.

I'm sure someone will open a PR for Anthropic/Cohere/etc. but a quick glance made it pretty clear they made it with OpenAI-first in mind, or even low hanging fruit like retries would have been abstracted away at a higher level.

I don't know where all you people work that your employer would prefer a random git repo (that has no support and no guarantee of updates) over a solution from Microsoft. (Alternatively: that you have so much free time that you'd prefer to fiddle with your own validation code instead of writing your actual app)

Open source solutions are great (which this still is, btw), but having a first-party solution is also a good thing.

You're overrating the influence of the name Microsoft here. It's just some devs from the company working on this with no proper guarantee backing the project.

I've been through this whole song and dance already with Microsoft's Guidance (another LLM project) and could not justify using it further in production at work. We built some tools and wrappers ourselves and it wasn't even that difficult. These libraries are often more trouble than they're worth.

I’m pretty sure Anders, Steve Lucco, and Daniel Rosenwasser worked on this. So inventors + current lead PM of typescript.

Should lend some credibility to the project.

Not really, better to leave the AI stuff to the AI people rather than PL people. When you don't, you get gimmick libraries like this rather than a solution that fits into the ecosystem

These folks have no pedigree when it comes to LLMs or AI, so no it does not lend credibility

I don't know which employer is hiring the people who make logical leaps like this but I thank them for their sacrifice.

At the end of the day the repo I linked is grokkable with about 10 minutes of effort, and has simple demonstrable usefulness by letting you swap out the LLM you're calling.

Both are experimental open source libraries in an experimental space.

Many companies expressly avoid Microsoft products, particularly given its well exposed history of embrace, extend, extinguish.
Look at Guidance - that's being ignored by Microsoft yet it's an official repo
I use Zod a great deal day to day, so this is appealing inasmuch as it would allow me to re-use those definitions.
Anything like this but for Python?
these are trivial steps you can add in any script, as your link demonstrates.

Why would I want to add all this extra stuff just for that? The opaque retry until it returns valid JSON? That sounds like it will make for many pleasant support cases or issues

Personally, I have found investing more effort in the actual prompt engineering improves success rates and reduces the need to retry with an appended error message. Especially helpful are input/output pairs (i.e. few-shot) and while we haven't tried it yet, I imagine fine-tuning and distillation would improve the situation even more

There are many subtleties to invoking the typescript type checker from node. It's nice to have support for that from the team that maintains the type checker.
Admittedly, couldn't they spend some effort on making that invocation less subtle instead?
Is the team working on typescript in a good position to be making LLM libraries, interfaces, and abstractions? Do they have the background and context to understand how their library fits into AI workflows? Could they have provided the same value with a blog post and sample code?
Your coworkers must love you.
Indeed, we all do what we are good at and appreciate each other and no having to do the things they do

But what does your comment have to do with any of this at all?

agreed. not to mention we're talking about Microsoft here. the same company that gave us "guidance", a defunct LLM framework.
I’ve used guidance, why is it defunct? I found it was powerful at templating, really decent for generating synthetic datasets.
Pretty much all the LLM libraries I'm seeing are like this. They boil down to a request to the LLM to do something in a certain way. I've noticed under complex conditions, they stop listening and start reverting to their 'default' behavior.

But that said it still feels like using a library is the right thing to do... so I'm still watching this space to see what matures and emerges as a good-enough approach.

Where's the vendor lock-in? This is an open source library and the file you linked to even includes configs for two vendors: ChatGPT and Bard.
vendor lock in to a library and the design choices they make

basically, since it reduces the user input space, you are giving up flexibility and control for some questionably valuable abstractions, such as a predefined prompt, no ability to prompt engineer, CoT/ToT, etc...

if anything, choose a broader framework like langchain and have something like this an extension or plugin to the framework, no need for a library for this one little thing

Weird, I would suggest the opposite - LangChain is a nuke that was hastily assembled to crack a peanut, almond, and whatever other nuts were hype driven into the framework. It's a mess of spaghetti - which is nothing against the Langchain authors - it was just the first iteration in a new problem space. But adopting it in a new codebase is a big commitment that locks you into complexity you'll almost certainly want to shed at some point.

Whereas this library is a much more focused approach that does one small thing well, and could be integrated into your own homerolled frameworks (or probably even langchain itself, assuming you use langchain.js).

I agree that LangChain has some pretty poor APIs and abstractions, and I do even question the usefulness of what they provide.

But this library amounts to a loop around a very basic prompt and running the ts toolchain to produce an error message that is then appended to the prompt next iteration. It is not easily integrated into anything and is written by people who do not practice or develop AI.

The value is turn unstructured data into structured data and ensure it satisfies schema constraints.

For example: you have 1000 free-text survey responses about your product, building a schema and for-each `TypeChat`ing them would get you a dataset for that free-text. It's mind-bogglingly useful.

yes, turning unstructured data into structured data is one of the most useful ways to use an LLM right now. It has been done before with using schemas and could be done without all the extra cruft.

There was a similar example a few months back using XML instead, but I haven't heard much about it since, because again, the library did not add value on top of doing these things in a more open or scripted setting.

MSFT has another project in similar vain, guardrails, interesting idea, but made worse by wrapping it in a library. Most of these LLM ideas are better as a function than a library, make them transform the i/o rather than every library needing to write wrappers around the LLM APIs as well

There are several more making use of OpenAPI / JSONSchema rather than TS.

We use a subset of CUE, essentially JSON without as many quotes or commas. The LLMs are quite flexible with few-shot learning. They can be made more reliable with fine-tuning. They can be made faster and cheaper with distillation.

Yes as the abstractions gets better it becomes easier to code useful things.
the debate is about how valuable the abstraction here is to warrant a library, and the fact that it predefines the prompt and api call flow, so you cannot prompt engineer or use something like CoT/ToT
This amounts to saying ‘how dare someone publish some code that they wrote!’

Is it your impression that this is being pitched as some grand solution?

That this was published as a way to shut out other people from doing the same thing in other ways?

Can’t we just look at a cool thing someone did, and released for other people to play with, and say ‘huh! That’s neat!’ And get inspired?

except it's not neat or novel, this idea has been around and implemented for many months now, by many people, using many methods. Running a tool on the output and then feeding that back to the LLM, also not novel and a widely used technique

> We'd love to know if TypeChat is something that's useful and interests you!

We are providing feedback to them here

People can debate till the cows come home. But it's worth remembering that hacker news is about stimulating intellectual curiosity.

There's no reason for this to have a fixed flow, either - it's got a hint of diagonalizability to it - by which I mean, you can get the model to build a schema for dynamic flows, given a 'bootstrapping' schema. No different than what has always had to happen for someone to write a compiler for a programming language in the language itself.

Getting these models to reliably return a consistent structure without frequent human intervention and/or having to account for the personal moral opinions of big tech CEOs is not trivial, no.
There are multiple ways to get structured output, and what this library is doing is not really that interesting. The concept is interesting and has had multiple implementations already, the code (and abstraction) here is not interesting and creates more issues than it solves
Tell me how to get reliably structured output. I'm all ears.
I have a prompt from February pre chatgpt and now I just use the models functions support, it's built for exactly that
It’s essentially prompt engineering as a service with some basic quality-control features thrown in.

Sure, your engineers could implement it themselves, but don’t they have better things to do?

the quality of the prompt does not look that good from my experience reaching flexible structured output based on a schema

There are other questionable decisions and a valuable use of engineering time is indeed to evaluate candidate abstractions and think about the long-term cost of adopting them. In this case, it does not seem like it saves that much effort and in the long run means a lot of important LLM knobs are out of your control. Not a good tradeoff

You can probably define the python language grammar as a typescript type though!