Hacker News new | ask | show | jobs
by austinz 3699 days ago
Regarding strings...

I briefly ran the code sample you graciously provided me with a month ago through the profiler (https://gist.github.com/austinzheng/d6c674780a58cb63832c4df3...).

Long story short, it looks like the reflection machinery in the standard library is improperly being used to construct String instances. Doing so, while probably not sufficient to account for the entirety of the awful performance, is probably quite expensive. This looks like a bug and I'll try to dig deeper into it this week.

Swift also badly needs a native version of NSCharacterSet, even if only for programmer ergonomics.

The developer in charge of the standard library has mentioned that that team intends on redesigning the String API in the near future; this should provide an opportunity to reexamine the performance implications of the current implementations.

1 comments

That's excellent news. Thanks a lot!
After a bit more investigation, I found that if you replace the following code:

  result.append(begin == eos ? "" : String(cs[begin..<end.successor()]))
with this:

  if begin == eos {
    result.append("")
  } else if let str = String(cs[begin..<end.successor()]) {
    result.append(str)
  }
runtime goes down from ~3 seconds to ~2.2 seconds.

This is due to a rather insidious API design decision:

  init?(_ view: String.UTF16View)
constructs a string out of a UTF16 view, but it can fail. If used in a context where its type is inferred to be non-nullable, the following generic reflection-related init is used instead:

  init<T>(_ instance: T)
I'm going to bring this up on the list and see if there are better ways of doing things.

As far as I can tell most of the rest of the time is spent in the Swift native Unicode --> UTF16 decoding machinery, and NSCharacterSet.

I'm seeing a 23% increase in running time after making that same change. Strange.

That's with Xcode 7.3.1 (7D1014)

Weird. I'm using the same version of Xcode, running on a trashcan Mac Pro running OS X 10.11.4.
OK I found the cause of that weirdness. I had slightly changed my test code since I posted it here weeks ago. After undoing that change I'm seeing the same thing you do.

But this raises more questions, because what I changed is the test code generation. I'm now generating a million different strings instead of adding the same string a million times (note the \(i) at the end of the string):

    func generateTestData() -> [String] {
        var a = [String]()
        for i in 0..<N {
            a.append(",,abc, 123  ,x, , more more more,\u{A0}and yet more, \(i)")
        }
        return a
    }
The running time of generateTestData() isn't what we measure but apparently the performance improvement you found only works if the same string is used every time. Otherwise performance drops.
That's bizarre.

One thing I've noticed is that performing the string scanning operation is relatively cheap. (If the splitAndTrim code is modified to not use Strings and to return a [String.UTF16View], the runtime is around 1.2 seconds.) It's the process of building Strings out of those UTF16 views that is destroying performance.

I still don't know why changing the way the input data are constructed would have that effect, except to guess that the underlying representation is different somehow. I'll file a ticket.