Hacker News new | ask | show | jobs
by happyopossum 1413 days ago
Serverless sounds cool for this at first, but what are the ingest/compute costs going to look like at a modest 20Tb/day? How about 100, or 1Pb?

Honestly think at that point you’d be better off and cheaper to go with a commercial security data lake..

1 comments

Matano is designed specifically for petabyte-scale security log analytics use cases, so performance and costs are a top priority. Our data pipeline borrows from Vector's Rust based data transformation language [0] for maximal performance, with each parallel function invocation capable of processing upwards of 20MiB/s [1] thanks to auto-vectorizaton.

Roughly this comes out to $1/(TB/day) for ingest compute costs which is much cheaper than a commercial solution. We are also working on moving over our Lambda's to ARM for even better cost-effficency.

[0] https://vector.dev/docs/reference/vrl/ [1] https://vector.dev/docs/setup/going-to-prod/sizing/#sizing

> Roughly this comes out to $1/(TB/day) for ingest compute costs which is much cheaper than a commercial solution. We are also working on moving over our Lambda's to ARM for even better cost-effficency.

Given a prior exploration of a security visibility stack on AWS at scale, this is likely a colossal sized operational expense.