Hacker News new | ask | show | jobs
by throwawanginee2 1352 days ago
Splunk is a great tool but expensive. I like splunk's aggregation feature very much. If it is server logs, it can aggregate and tell me how many http 500 errors I have, how many requests resulted in 404 etc. It can tell me top IP addresses where I am getting requests from, etc.

I want to take a CSV file and provide same functionality. Eg. Give user information on how many times each field occurs. For example, if it is a CSV file with cities, countries, continents, I want to aggregate and tell how many cities are in each country and how many countries are in each continent.

Is there an open source version of splunk I can modify? I tried logstash but it is not straight forward to work with. It still needs me to define schema everytime.

Thx!

3 comments

> Is there an open source version of splunk I can modify?

https://github.com/grafana/loki might work for you. It’s not a drop in replacement for Splunk, FWIW.

Is there any way to do subqueries (or some kind of join) with loki? That is one feature of splunk I haven't seen elsewhere, open source or not.
New Relic’s NRQL can do sub queries.
What you're describing sounds like Loki (Grafana's Prometheus inspired logging tool, which is super fast and cheap/easy, even though it sacrifices some flexibility to get there) Metric Queries: https://grafana.com/docs/loki/latest/logql/metric_queries/
We're building Matano (https://github.com/matanolabs/matano), an open source security lake platform. It's a different approach since we normalize logs from JSON, csv, etc, and ingest them into Apache Iceberg tables, but it allows for massive scale and joins, aggregations, etc using SQL.