| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zeptomu 3233 days ago

> create a very large in memory read-only pd dataframe and then put a flask interface to operations on that dataframe using gunicorn and expose as an API. [...]

May I ask what you consider large memory - MByte, GByte, TByte? The simplest solution is to store it as a blob on a SSD, and read it via simple file IO or put it into a DB. But I assume this was too slow, so it would be interesting to go into more details.

In the end you can do shared memory with multiprocessing in Python, which - I have to admit - requires some setup and bookkeeping work.

1 comments

detroitcoder 3232 days ago

Lets say there are a couple dataframes that need a matrix multiply that take up about 10gb on a 32gb host. I want to parameterize these manipulations and expose over http. I can only afford to cache 3 sets of them, which means that I can perform 3 concurrent requests. I would like to provide more concurrency than this without reading from disk or storing the data out of process in a separate service which adds complexity.

link