| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lifthrasiir 1752 days ago

> [...] when I'm using a language I don't keep the implementation details of it's data structures in my head, I just use the provided API. I think the representation of data structurs is a different discussion.

The exact details of data structures used do not matter, but their implications should still be in your head. Depending on the implementation you may need a separate type for string builder, or can append efficiently only to the end, or can append efficiently to both ends but not in the middle, or can append or insert an arbitrary string at any position but everything takes log(n) time by default.

> Do I need to be aware of 6 different potential ways to sequentially navigate the string? Is there a way to do it using a loop, iterator protocol, destructuring, pattern matching, coroutines, special string indexing syntax, etc? Or can I just use a simple, uniform consistent interface and build the library on top of that?

There is nothing like a "simple, uniform consistent" interface for strings. Strings are conceptually a free monoid^W^W an array of string units with the following tendencies:

- The "string units" can be anything from bytes to UCS-2/UTF-16 units to code points (or Unicode scalar values if you don't like surrogate pairs) to grapheme clusters (whatever they are) to words to lines. Even worse, a single string may have to be accessible in multiple such units.

- Many common desired operations can be efficiently described as a linear scan across string units. There is a reason that regular expression exists for strings but not for general arrays. (Regex-like tools for arrays would be still useful, but less so than strings.)

- A slicing operation is very common and resulting slices generally do not have to be mutated (even though the original string itself can be mutable), suggesting an effective optimization.

As such there are multiple trade-offs in string interfaces across languages and there is hardly the single best answer.