Hacker News new | ask | show | jobs
by fauigerzigerk 3694 days ago
OK I found the cause of that weirdness. I had slightly changed my test code since I posted it here weeks ago. After undoing that change I'm seeing the same thing you do.

But this raises more questions, because what I changed is the test code generation. I'm now generating a million different strings instead of adding the same string a million times (note the \(i) at the end of the string):

    func generateTestData() -> [String] {
        var a = [String]()
        for i in 0..<N {
            a.append(",,abc, 123  ,x, , more more more,\u{A0}and yet more, \(i)")
        }
        return a
    }
The running time of generateTestData() isn't what we measure but apparently the performance improvement you found only works if the same string is used every time. Otherwise performance drops.
1 comments

That's bizarre.

One thing I've noticed is that performing the string scanning operation is relatively cheap. (If the splitAndTrim code is modified to not use Strings and to return a [String.UTF16View], the runtime is around 1.2 seconds.) It's the process of building Strings out of those UTF16 views that is destroying performance.

I still don't know why changing the way the input data are constructed would have that effect, except to guess that the underlying representation is different somehow. I'll file a ticket.

This looks to me like memory allocation / reference counting is at least part of the problem. Slicing a UTF16View to get another UTF16View mostly likely doesn't involve any dynamic memory allocation at all.