| HN Mirror

It has always been the Cadillac of search, and moreso with unstructured indexing (e.g. key collisions with different data structures. Foo = string vs foo = integer vs foo = array).

Your queries or infrastructure were not optimized. It’s very fast when optimized.

Interesting,

It was Splunk managed and configured, so I would have thought it optimized, but I guess they made more money from it not being optimized.

If I remember right then we were throwing about 200+ GB at it a day.

Splunk worked with us to optimize our configs but we always managed it ourselves.

Nope, sports betting. I was on the operations side of things and Splunk was something the corporate side organised and championed, but it just couldn't be used to troubleshoot issues in the time frames we needed.

_8j50 1350 days ago

Very very fast, I can do an all time search on terabytes of data in seconds.

But you have to learn to use it, if you don't give it an index and a sourcetype that will slow it down, and like ES leading wildcards slow things down. The fastest searches are simple terms like a word or an IP.

Interesting,

From the general responses it sounds like we got unlucky with a dud implementation.

mattnewton 1350 days ago

40 minutes sounds exceptionally bad, but 5-10 minutes with splunk was totally common when I worked at Apple almost a decade ago, and I could never figure out why because I only ever used it for O(grep on a log file on disk) level operations. I was probably holding it wrong or maybe the infra team had misconfigured it, idk.

I personally brought Splunk to Apple in 2010 alongside a small handful of people (Hi, Sean and Ariel!). There is a massive difference between real-time searches where latency beyond a few seconds is unacceptable, and historical searches which can take a bit longer. I can assure you that it did its job spectacularly, much to my chagrin that there are few competitors to this day.

mattnewton 1350 days ago

Cool! That makes sense. I only really used it as part of the (now defunct I think) "orchard" internal hosting platform that was very much beta when I was using it, for tiny internal apps running on like 4 instances at most, and missed being able to just grep log files; my wild, uneducated guess from what you said is that there was some kind of pooling of our meager logs with other people's from orchard, or we were otherwise off the happy path.

I was part of Orchard and miss it dearly. It had a lot of potential but it was launched as a proof of concept built on pooled resources freed up from optimizing legacy workloads (namely, moving Siri from VMware to Mesos).

It never got the love it deserved and I could absolutely believe that its Splunk cluster suffered as a result. RIP

mattnewton 1350 days ago

100% agree, the idea of an internal Heroku was a great one, it just didn't seem to work with how Apple was designed organizationally or something and seemed under resourced.

jitl 1350 days ago

We recently transitioned to it at Notion and it’s been very fast, outperforming the previous log vendor substantially while offering better search and UX. If you used the on-prem version, the cloud version is quite a different experience.

I don't know which version we used, just that it was managed and configured by Splunk.

We were only sending a small subset of our logs to it so about 200+ GB a day. Our Linux box with spinning disks could grep the full set of logs much faster than querying Splunk, so I don't think anyone really used it.

throwbigdata 1350 days ago

Yeah but you had clue. Those who don’t use Splunk.

greggsy 1350 days ago

No offence, but that sounds like lazy query design, poor architecture, or both.

Maybe we had a bad consultant, I don't know. Splunk were the ones managing and configuring it.

sandy_coyote 1350 days ago

Yes BUT it needs tuning. Splunk is complicated and takes continuous maintenance to optimize speed.

I work as a Splunk integrator and here's what I often see:

1. Customer installs Splunk with a qualified Splunk or third-party architect team. The deployment works well.

2. Customer adds infrastructure to the deployment. Splunk slows down. License costs go up.

3. Customer chooses between outside help or DIY. DIY rarely works.

4. Customer now needs outside help. Now Splunk is very slow and expensive, and now it will cost a lot to tune it.

Splunk, the company, is in a tough spot for several reasons: rotating c-level cast, unpopular changes to license model, bad acquisitions. The product is still best in class but tough to keep optimized.

kennend3 1350 days ago

So basically what you are saying is this.

A firm with a competent IT team is unable to get splunk to work and only "outside help" can make the product work?

Given splunks license costs are tied to data ingested, how do you integrate new infrastructure to the deployment and not have license costs go up?

Way to sell us on Splunk?

bak3y 1350 days ago

Anecdotal - I took over a small, ill maintained Splunk installation at $JOB-2 and reworked it following Splunks current best-practices and it ran like a top as of when I left that place. Having done that process I'm fully convinced that if you're going to run Splunk on-prem you need a dedicated sysadmin for it that knows Splunk's stack. And that kind of person isn't cheap to hire or keep in that role.

kennend3 1350 days ago

we had an on-prem splunk implementation and it was SOO SLOW.. it was built/managed by splunk and its consultants.

We finally got rid of it a few years later, but for the entire time we had it, it was a constant "round hole square peg" problems. Each time the consultants assured us Splunk could do what we needed, each time it could not.

chillfox 1349 days ago

I wonder if Splunk has a QA problem with their consultants or if there are certain edge cases they simply don't do well with.

Just that it looks like most people here had a good experience and we had a bad one for some reason.

bak3y 1336 days ago

Just coming back around to this, we also used Splunk consultants for their SIEM solution and the first one we got wasn't very good, but the second was amazing (I wish we could have hired her directly).

The guy we had help us tune our clusters after I rebuilt them all was also very good. Fortunately I'd done most everything by the books and we overkilled the nodes with hardware (we had some older hypervisor nodes lying around I stole for Splunk).

HyperSane 1350 days ago

On modern NVMe storage it is INCREDIBLY fast.