Hacker News new | ask | show | jobs
by Yanael 639 days ago
When you ask to return JSON data using streaming, you will notice that the response is incomplete and unparseable by JSON libraries, resulting in malformed errors. You will have to wait for the entire stream to complete.

To solve this problem I tried to define a spec and built a lib for it:

- [lib] https://github.com/st3w4r/openai-partial-stream/tree/main

- [spec] https://github.com/st3w4r/openai-partial-stream/blob/main/sp...

4 comments

Very interesting. I tried to solve this problem too, and my code parses incomplete JSON allowing partial values and fully complete values to be accessed.

Why do you wait for the entire stream to be complete? Some objects in the JSON structure can be shown to be complete before the stream ends.

Yeah, it's an interesting problem to solve. The library is designed to parse incomplete json without waiting for the stream to finish.
I’ve been using the ijson Python library for that - I have notes on that here: https://til.simonwillison.net/json/ijson-stream
Pydantic also have support for parsing partial JSON. https://docs.pydantic.dev/latest/concepts/json/#partial-json...

  from pydantic_core import from_json

  partial_json_data = '["aa", "bb", "c'  
  
  result = from_json(partial_json_data, allow_partial=True)
  print(result)  
  #> ['aa', 'bb']
You can also use their `jiter` package directly if you don't otherwise use pydantic. https://github.com/pydantic/jiter/tree/main/crates/jiter-pyt...
That's neat, I hadn't seen that. Docs were lacking so I submitted a PR: https://github.com/pydantic/jiter/pull/143
Nice, it looks like a good library to build on top of. I like the available events: start_map, end_map, etc. I did try a library in JS that had similar ones, but it lacked the granularity to cover all use cases for individual fields instead of an entire item. I'll keep a note of this one if I do Python JSON streaming.
These are great. I've been working on trying to get markup working with streaming and it's a seemingly hard problem. This should help with figuring it out!
Awesome, works great! Love the modes "Real-time", "progressive", etc.
Thanks! Yeah, creating an abstraction over the raw JSON and how you want to use it in your code makes it more practical.