| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fauigerzigerk 3694 days ago

OK I found the cause of that weirdness. I had slightly changed my test code since I posted it here weeks ago. After undoing that change I'm seeing the same thing you do.

But this raises more questions, because what I changed is the test code generation. I'm now generating a million different strings instead of adding the same string a million times (note the \(i) at the end of the string):

    func generateTestData() -> [String] {
        var a = [String]()
        for i in 0..<N {
            a.append(",,abc, 123  ,x, , more more more,\u{A0}and yet more, \(i)")
        }
        return a
    }

The running time of generateTestData() isn't what we measure but apparently the performance improvement you found only works if the same string is used every time. Otherwise performance drops.

1 comments

austinz 3694 days ago

That's bizarre.

One thing I've noticed is that performing the string scanning operation is relatively cheap. (If the splitAndTrim code is modified to not use Strings and to return a [String.UTF16View], the runtime is around 1.2 seconds.) It's the process of building Strings out of those UTF16 views that is destroying performance.

I still don't know why changing the way the input data are constructed would have that effect, except to guess that the underlying representation is different somehow. I'll file a ticket.

link

fauigerzigerk 3694 days ago

This looks to me like memory allocation / reference counting is at least part of the problem. Slicing a UTF16View to get another UTF16View mostly likely doesn't involve any dynamic memory allocation at all.

link