|
|
|
|
|
by chupy
3953 days ago
|
|
Hello Jeffrey,
First I wanted to say that your post is very nicely written and full of juicy details! :) Regarding the sets database, I had to solve quite a similar problem at the company where I work and instead of sets I actually chose to use the Redis HypeLogLog structure instead of sets because for near real time results you just need an approximate count of the sets / or their intersection and you don't need to know the specific set members. I just wanted to let you know that it works great for us for with doing intersections (PFMERGE) on sets containing hundreds of millions of members. If anybody is interested I can do a writeup about it. Did you ever consider using that? |
|
For us, however, it's important to get the set members at the end of the day. Amplitude is unique from other analytics products in that we put a lot of emphasis on the actual users that correspond to a data point on a graph -- one of our key features, Microscope, is the ability to view those users, see more context around the events they are performing, and potentially create a dynamic cohort out of them. As such, approximations that don't allow us to get the set members don't quite satisfy our use case.