Hacker News new | ask | show | jobs
Ask HN: What is the best way to calculate percentile of streaming data?
1 points by rishiloyola 2403 days ago
Hello,

I need to code python function which will iterate through incoming requests and calculate percentile of size of body dynamically. Which lib or algo do you guy recommend?

Example: Requests are coming in batches. Let's say first batch has 50 requests, next one has 80 etc. I need to calculate percentile of size of body that each request has.

1 comments

I think you need to provide a bit more info, are you using Apache Kafka? Something else?

The function would be individual batch requests divided by total requests multiplied by 100, but I dont think thats what your looking for.

Edit: actually, for your question it would be the inverse of batch size multiplied by 100, eg. First batch has 50 request so that would be 1/50×100 or 2%

No I am not using Kafka. It is just basic python server. I want to calculate what is the nth percentile of size of my incoming request object over past one hour.

I don't want to store size of each request in memory. It will eat so much of my RAM.

Incoming traffic:

- 1st batch

--> 60 requests

---> size of 1st request is 10kb

---> size of 2nd request is 2kb

...

...

- 2nd batch

--> 10 requests

---> size of 1st request is 5kb

---> size of 2nd request is 8kb

...

- 100th batch

I am talking about percentile(10th, 50th, 95th) size of request.