| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by itamarst 399 days ago

I talk about this more explicitly in the PyCon talk (https://pythonspeed.com/pycon2025/slides/ - video soon) though that's not specifically about Pydantic, but basically:

1. Inefficient parser implementation. It's just... very easy to allocate way too much memory if you don't think about large-scale documents, and very difficult to measure. Common problem with many (but not all) JSON parsers.

2. CPython in-memory representation is large compared to compiled languages. So e.g. 4-digit integer is 5-6 bytes in JSON, 8 in Rust if you do i64, 25ish in CPython. An empty dictionary is 64 bytes.

1 comments

cozzyd 399 days ago

Funny to see awkward array in this context! (And... do people really store giant datasets in json?!?).

link

chao- 399 days ago

Often the legacy of an engineer (or team) who "did what they had to do" to meet a deadline, and if they wanted to migrate to something better post-launch, weren't allowed to allocate time to go back and do so.

At least JSON or CSV is better than the ad hoc homegrown formats you found at medium-sized companies that came out of the 90's and 00's.

link

ljm 399 days ago

Some people even use AI-generated JSON as a semantic layer over their SQL.

link

jfb 399 days ago

My sweet summer child

link