| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jack9 3912 days ago
	Do you using batching to reach that scale of throughput? Streams sometimes are pre-aggregated data and it wasn't clear on if you maintained the granularity through the changes.

2 comments

jandrewrogers 3912 days ago

I can't speak for their implementation but batching is not necessary. Stream processing complex JSON documents and storing the documents to disk at rates of 500k documents/second per server is demonstrably achievable on some scale-out systems.

The internal architectures make an enormous difference in throughput. A proper high-performance stream processing engine does not look anything like the "Hadoop in RAM" style model.

link

jack9 3912 days ago

> Stream processing complex JSON documents and storing the documents to disk at rates of 500k documents/second per server is demonstrably achievable on some scale-out systems

So is it per server or scaled out? I thought SSDs have capped around 100k discrete per second (P/E aka write cycles).

Can you give an example? I've been unable to practically reach more than a scale of 10k/sec/server using a number of technologies and combinations to collect from socket, parse json and write to socket. That's just my specific use case.

link

lcr 3912 days ago

Looking at the top end of Intel's SSD lineup I see that they have a product that advertises up to 175k IOPS of random 4K writes. Is this what you are referring?

The product is the 2TB P3700.

link

rayscondition 3912 days ago

There's no batching, we have a 1 to 1 mapping of kafka messages to measurements we receive from our api, that could change though over time. Superchief just reads the messages and each message is passed off to another thread for processing.

link