| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by manishjhawar 2336 days ago
	I'm currently working on a solution involving larger data sets to match a record with a binary score (0/1). I'm using Redis with the Bloom Filter module. This works in that the query results are sub-second, but the data ingestion/filter population part is quite slow comparatively (<100 MB/s). Another block for me is if having to use multiple filters to query across multiple sets which just multiplies all the resources needed. Does Spark have any advantages or specialized filters for this use case? (I have nil experience with Spark, but am ready to dig up if it would really help.)