| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tmostak 3341 days ago

You might be surprised.

Telcos need to troubleshoot network problems in real time, automakers and insurance companies need to track cars in real-time, oil companies need to interactively query and visualize geological data, and the infosec industry needs real-time packet analysis. We have customers almost in every vertical, all united by their need for real-time analytics. Some want to use MapD us visualization, others for programatic querying for things like fraud detection, and others still to feed into machine learning algorithms.

I'm also curious how you envisage paying thousands of dollars per year to get queries in dozens of ms on datasets this size, much less 10-100X larger (which customers would often use MapD for). Mark benchmarked a 6-node ds2.8xlarge cluster of Redshift (> $40/hour) and found it up to 70X slower than MapD on this dataset. That's similar to our price on Amazon for this 2-node cluster.

Not saying Redshift isn't a great system, just that I don't buy the price/performance numbers you are quoting (for real workloads, not for some specific query that can be indexed well, etc)

1 comments

menegattig 3341 days ago

Very clear, thanks so much for the detailed explanation.

I'm still curious about how much it would cost for scenario where you have 1 billion user and 200 billion events for a year of data and keep adding 10 billion monthly (a very real DMP or Telco scenario) and you have to make a query like this one below on top of all this data (200 billion records). I'm wondering how many MapD servers/infrastructure I would need to have in order to get results under 100-ms.

Count UNIQUE Users that from "San Francisco" OR "New York" AND accessed the pages "/sports" OR "/news" more than 3 times in the past 12 months.

link