| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rhacker 3014 days ago
	I definitely used it on 8GB or RAM but only about 10M documents or so. It's pretty kick-ass for ad-hoc queries about data that you would typically have to set up a star schema. I tell you probably the best thing you can do is set up a Kafka queue, Apache Spark and ElasticSearch (do the research around these 3) but you'll love the ability to find out things like how many M(ale) patients that are above the age of 30 that have diabetes have died shortly after a surgery. Trying to set all that up with complicated star-schemas etc.. really sucks compared to just building a JSON format that you pipe through Kafka or Spark. Edit: And yes originally used it for log processing, but really people should definitely try it out for replacing very expensive BA stacks. For log analysis something called Graylog that actually uses Elk internally or just go with Splunk which gives you out of the box primitives for session length calculations. In ELK if you want to do something that sounds as simple as session lenght - you'll ending having to reprocess documents using Kafka or Spark with background jobs that reprocess documents greater than 1 hour old (or something to that effect).