Hacker News new | ask | show | jobs
by zahlman 90 days ago
> Accidental O(n²) with Streams Inside Loops

Man that code looks awful. Really reminds me of why I drifted away from Java over time. Not just the algorithm, of course; the repetitiveness, the hoops that you have to jump through in order to do pretty "stream processing"... and then it's not even an FP algorithm in the end, either way!

Honestly the only time I can imagine the "process the whole [collection] per iteration" thing coming up is where either you really do need to compare (or at least really are intentionally comparing) each element to each other element, or else this exact problem of building a histogram. And for the latter I honestly haven't seen people fully fall into this trap very often. More commonly people will try to iterate over the possible buckets (here, hour values), sometimes with a first pass to figure out what those might be. That's still extra work, but at least it's O(kn) instead of O(n^2).

You can do this sort of thing in an elegant, "functional" looking way if you sort the data first and then group it by the same key. That first pass is O(n lg n) if you use a classical sort; making a histogram like this in the first place is basically equivalent to radix sort, but it's nice to not have to write that yourself. I just want to show off what it can look like e.g. in Python:

  def local_hour(order):
      return datetime.datetime.fromtimestamp(order.timestamp).hour

  groups = itertools.groupby(sorted(orders, key=local_hour), key=local_hour)
  orders_by_hour = {hour: len(list(orders)) for (hour, orders) in groups}
Anyway, overall I feel like these kinds of things are mostly done by people who don't need to have the problem explained, who have simply been lazy or careless and simply need to be made to look in the right place to see the problem. Cf. Dan Luu's anecdotes https://danluu.com/algorithms-interviews/ , and I can't seem to find it right now but the story about saving a company millions of dollars finding Java code that was IIRC resizing an array one element at a time.

(Another edit: originally I missed that the code was only trying to count the number of orders in each hour, rather than collecting them. I fixed the code above, but the discussion makes less sense for the simplified problem. In Python we can do this with `collections.Counter`, but it wouldn't be unreasonable to tally things up in a pre-allocated `counts_by_hour = [0] * 24` either.)

----

Edit:

> String.format() came in last in every category. It has to... StringBuilder was consistently the fastest. The fix: [code not using StringBuilder]... Use String.format() for the numeric formatting where you need it, and let the compiler optimize the rest. Or just use a StringBuilder if you need full control.

Yeah, this is confused in a way that I find fairly typical of LLM output. The attitude towards `String.format` is just plain inconsistent. And there's no acknowledgment of how multiple `+`s in a line get optimized behind the scenes. And the "fix" still uses `String.format` to format the floating-point value, and there's no investigation of what that does to performance or whether it can be avoided.