| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by robbiemitchell 789 days ago

Processing high volumes of unstructured data (text)… we’re using a STAG architecture.

- Generate targeted LLM micro summaries of every record (ticket, call, etc.) continually

- Use layers of regex, semantic embeddings, and scoring enrichments to identify report rows (pivots on aggregates) worth attention, running on a schedule

- Proactively explain each report row by identifying what’s unusual about it and LLM summarizing a subset of the microsummaries.

- Push the result to webhook

Lack of JSON schema restriction is a significant barrier to entry on hooking LLMs up to a multi step process.

Another is preventing LLMs from adding intro or conclusion text.

5 comments

adamsbriscoe 789 days ago

> Lack of JSON schema restriction is a significant barrier to entry on hooking LLMs up to a multi step process.

(Plug) I shipped a dedicated OpenAI-compatible API for this, jsonmode.com a couple weeks ago and just integrated Groq (they were nice enough to bump up the rate limits) so it's crazy fast. It's a WIP but so far very comparable to JSON output from frontier models, with some bonus features (web crawling etc).

tarasglek 789 days ago

The metallica-esque lightning logo is cool

joatmon-snoo 789 days ago

We actually built an error-tolerant JSON parser to handle this. Our customers were reporting exactly the same issue- trying a bunch of different techniques to get more usefully structured data out.

You can check it out over at https://github.com/BoundaryML/baml. Would love to talk if this is something that seems interesting!

BoorishBears 789 days ago

> Lack of JSON schema restriction is a significant barrier to entry on hooking LLMs up to a multi step process.

How are you struggling with this, let alone as a significant barrier? JSON adherence with a well thought out schema hasn't been a worry between improved model performance and various grammar based constraint systems in a while.

> Another is preventing LLMs from adding intro or conclusion text.

Also trivial to work around by pre-filling and stop tokens, or just extremely basic text parsing.

Also would recommend writing out Stream-Triggered Augmented Generation since the term is so barely used it might as well be made up from the POV of someone trying to understand the comment

robbiemitchell 789 days ago

Asking even a top-notch LLM to output well formed JSON simply fails sometimes. And when you’re running LLMs at high volume in the background, you can’t use the best available until the last mile.

You work around it with post-processing and retries. But it’s still a bit brittle given how much stuff happens downstream without supervision.

fancy_pantser 789 days ago

Constrained output with GBNF or JSON is much more efficient and less error-prone. I hope nobody outside of hobby projects is still using error/retry loops.

joatmon-snoo 789 days ago

Constraining output means you don’t get to use ChatGPT or Claude though, and now you have to run your own stuff. Maybe for some folks that’s OK, but really annoying for others.

fancy_pantser 789 days ago

You're totally right, I'm in my own HPC bubble. The organizations I work with create their own models and it's easy for me to forget that's the exception more than the rule. I apologize for making too many assumptions in my previous comment.

joatmon-snoo 788 days ago

Not at all!

Out of curiosity- do those orgs not find the loss of generality that comes from custom models to be an issue? e.g. vs using Llama or Mistral or some other open model?

int_19h 788 days ago

I do wonder why, though. Constraining output based on logits is a fairly simple and easy-to-implement idea, so why is this not part of e.g. the OpenAI API yet? They don't even have to expose it at the lowest level, just use it to force valid JSON in the output on their end.

jncfhnb 789 days ago

… why would you have the LLM spit out a json rather than define the json yourself and have the LLM supply values?

esafak 789 days ago

If the LLM doesn't output data that conforms to a schema, you can't reliably parse it, so you're back to square one.

jncfhnb 789 days ago

It’s significantly easier to output an integer than a JSON with a key value structure where the value is an integer and everything else is exactly as desired

esafak 788 days ago

That's because you've dumbed down the problem. If it was just about outputting one integer, there would be nothing to discuss. Now add a bunch more fields, add some nesting and other constraints into it...

janpieterz 789 days ago

How would I do this reliably? Eg give me 10 different values, all in one prompt for performance reasons?

Might not need JSON but whatever format it outputs, it needs to be reliable.

jncfhnb 788 days ago

Don’t do it all in one prompt.

janpieterz 788 days ago

Right, but now I’m basically running a huge performance hit, need to parallelize my queries etc.

I was parsing a document recently, 10-ish questions for 1 document, would make things expensive.

Might be what’s needed but not ideal.

yeahwhatever10 789 days ago

The phrase you want to search is "constrained decoding".

BoorishBears 789 days ago

The best available actually have the fewest knobs for JSON schema enforcement (ie. OpenAI's JSON mode, which technically can still produce incorrect JSON)

If you're using anything less you should have a grammar that enforces exactly what tokens are allowed to be output. Fine Tuning can help too in case you're worried about the effects of constraining the generation, but in my experience it's not really a thing

benreesman 789 days ago

I only became aware of it recently and therefore haven’t done more than play with in a fairly cursory way, but unstructured.io seems to have a lot of traction and certainly in my little toy tests their open-source stuff seems pretty clearly better than the status quo.

Might be worth checking out.

lastdong 789 days ago

“Use layers of regex, semantic embeddings, and scoring enrichments to identify report rows (pivots on aggregates) worth attention, running on a schedule”

This is really interesting, is there any architecture documentation/articles that you can recommend?

evrydayhustling 783 days ago

I'm late to this party, but here's a post I wrote about it. This is more motivation but we are working on technical posts/papers for release. Happy to field emails in the meantime if this is timely for you.

https://www.linkedin.com/pulse/ai-2024-more-answers-fewer-qu...

lastdong 777 days ago

Awesome! Thank you