|
|
|
|
|
by natpat
1866 days ago
|
|
This is super interesting. I've recently also been working on a similar concept: we have a reasonable amount (in the terabytes) of data, that's fairly static, that I need to search fairly infrequently (but sometimes in bulk). A solution we came up with was a small , hot, in memory index, that points to the location of the data in a file on S3. Random access of a file on S3 is pretty fast, and running in an EC2 instance means latency is almost nil to S3. Cheap, fast and effective. We're using some custom Python code to build a Marisa Trie as our index. I was wondering if there were alternatives to this set up? |
|