| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jarpineh 4242 days ago

Wow, thank you, again. I have to definitely take a look this. My use cases tend to vary so much that creating a Hadoop like system would require too much custom coding.

I wonder if it is possible to have compression and de-duplication, so that there could be a one big base dataset and lots of containers that only add what new data they generate.

Anyhow, looking at this it feels really approachable. What I have in mind are quick-and-dirty data-sciency scripts for ad hoc use cases, like diffing structured files and combing over matrix data.