Hacker News new | ask | show | jobs
by tonetegeatinst 623 days ago
5TB of log data even if you cleaned up the data which would also take time, that's a lot of input for any model.

I think its probably more feasible to sort by type, or category. Maby do something like kibana or greylog so you can better visualize the logs and narrow down what's an IOC and what might just be a random error message. This also let's you look at the type of logs over a time period.

Any ML or AI model would be computationally expensive, and if this isn't something where you have the hardware to selfhost then you also need to upload 5TB of logs.