| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shaeqahmed 1245 days ago

We launched before Amazon Security Lake :)

Amazon Security Lake's main value prop is that it is a single place where AWS / partner security logs can be stored and sent to downstream vendors. As such, Amazon only writes OCSF normalized logs to the parquet-based data lake for it's own data in a fully managed way (VPC flow logs, Cloudtrail, etc.) and leaves it to the customers to handle the rest.

For partner sources, the integration approach has been to tell customers to set up infrastructure themselves to accomplish OCSF normalization, parquet conversion, etc. For example, here is okta's guide using Firehose and Lambda, https://www.okta.com/blog/2022/11/an-automated-approach-to-c...

The Amazon Security Lake offering is built on top of Lake Formation, which itself is an abstraction around services such as Glue, Athena, and S3. Security Lake is built using the legacy Hive style approach and does not use Athena Iceberg. There is a per-data cost associated with the service, in addition to the costs incurred by other services for your data lake. Looks like the primary use case of the service is being able to store first-party AWS logs across all your accounts in a data lake and being able to route them to analytical partners (SIEM) without much effort. It does not seem very useful for an organization that is looking to build its own security data lake with more advanced features, as you will still have to do all the work yourself.

Matano, has a broader goal to help orgs in every step of transforming, normalizing, enriching and storing all of their security logs into a structured data lake, as well as giving users a platform to build detection-as-code using Python & SQL for correlation on top of it (SIEM augmentation/alternative). All processing and data lake management (conversion to parquet, data compaction, table management) is fully automated by Matano, and users do not need to write any custom code to onboard data sources.

Matano can ingest data from Cloud, Endpoint, SaaS, and practically any custom source using the in-built Log transformation pipeline (think serverless Logstash). We are built around the Elastic Common Schema, and use Apache Iceberg (ACID support, recommended for Athena V2+). Matano's data lake is also vendor neutral and can be queried by any Iceberg-compatible engine without having to copy any data around (Snowflake, Spark, etc.).