| In my limited experience in a small biz running some SaaS web apps with new relic for monitoring > What are the key assets you monitor beyond the basics like CPU, RAM, and disk usage? Not much tbh. Those were the key things. Alerts for high CPU and memory. Being able to track those per container etc was useful. > Do you also keep tabs on network performance, processes, services, or other metrics? Services 100%. We did containerised services with docker swarm and one of the bug bears with new relic was having to sort out container label names and stuff to be able to filter things in the Ui. That took me a day or two to standardise (along with the fluentd logging labels so everything had the same labels). Background Linux Processes less so, but it was still useful, although we had to turn them off in new relic as they significantly increased the data ingestion (I tuned NR agent configs to minimise data we sent just so we could stick with the free tier as best as we could). > Additionally, we're debating whether to build a custom monitoring agent or leverage existing solutions like OpenTelemetry or Fluentd. I like fluentd, but I hate setting it up. Like I can never remember the filter and match syntax. Once it’s running I just leave it though so that’s nice never used open telemetry. Not sure how useful that info is for you. > What’s your take—would you trust a simple, bespoke agent, or would you feel more secure with a well-established solution? Ehhhh it depends. New relic was pretty established with a bunch of useful features but deffo felt like over kill for what was essentially two containerised django apps with some extra backend services. There was a lot of bloat in NR we probably didn’t ever touch. Including in the agentnitself which took up quite a bit of memory. > Lastly, what’s your preference for data collection—do you prefer an agent that pulls data or one that pushes it to the monitoring system? Personally push, mostly because I can set it up and probably forget about it — run it and add egress firewalls. Job done. Helps with network effect probably as easy to start. I can see pull being the preference for bigger enterprise though who would only want to allow x, y, z data out to third party. Especially for security etc. cos setting a new relic agent running with root access to the host is probably never gonna work in that environment (like new relic container agent asks for). What new relic kinda got right with their pushing agent was the configs. But finding out the settings was a bear as the docs are a hit of a nightmare. (Edited) |