Hacker News new | ask | show | jobs
by gunapologist99 1414 days ago
Looks neat, but in what way is this serverless?

It's a pretty complex diagram:

https://github.com/matanolabs/matano/blob/main/website/src/a...

2 comments

There's no servers to maintain in the entire architecture, we heavily use Lambda and even use MSK serverless for Kafka.
This sounds truely nightmarish and costly. Even moderate volumes of data are going to add up very quickly cost wise.

I see the term zero-ops. But maintaining and debugging this pipeline is going to require some ops, even if you are not managing VMs.

Using and maintaining Matano is a fraction of the cost compared to popular non-serverless alternatives like ELK or Spunk. Matano is specifically designed for petabyte-scale security analytics use-cases that don't fit in a traditional SIEM.

The serverless data ingestion pipeline means you don't need to over-provision for ingestion (Logstash and Splunk Forwarders are notorious for related costs / ops in high scale use-cases) in the write path. For reads, since Matano queries Iceberg tables backed by highly-compressed parquet files on object storage you won't pay anything close to what you would for a database or search engine based SIEM.

> For reads, since Matano queries Iceberg tables backed by highly-compressed parquet files on object storage you won't pay anything close to what you would for a database or search engine based SIEM

Where do you show an example of querying anything? There's an empty "detector" in the examples directory, which I guess gets called once per row of this 20MiB/s alleged elsewhere?

Anyway, I find comparing this to Splunk to be a bit premature

Tools like Spark, Trino, etc can be pointed at parquet/iceberg/etc files in S3, and they'll let you issue SQL queries against the files directly. Means it integrates out of the box with whatever data tooling is being used in your org already.
I am pretty sure they are saying is that the matano portion of it, which does the security log processing, deploys in a serverless fashion (lambda I assume?).

This means as a dev you don’t need to maintain a server, or even container image, you just deploy the code, which is less maintenance overhead and more scalable.

The diagram just shows how it interacts with the other components of the log pipeline.

Thanks for explaining! Yep, that's what we mean. We use serverless services like S3, Lambda, Athena, MSK Serverless, Firehose so you don't have to maintain any servers.