Hacker News new | ask | show | jobs
by adrianN 3297 days ago
But iterating over the characters is a very useful feature. It's used in all string algorithms that I can remember in the time it takes to write this comment.
4 comments

But iteration is not a property of a list, but of the Foldable type class. A list is just one implementation of something that's Foldable/iterable. You can easily implement Foldable for custom text types.
>But iterating over the characters is a very useful feature. It's used in all string algorithms that I can remember in the time it takes to write this comment.

And you still do that with atomic strings -- you just either add a helper method like charAt(i) that gives you access to each character (or, rather, each rune), or you have some way to turn a string of length N into a list of N strings of length one.

And those string algorithms likely break in subtle ways when they handle characters that span multiple codepoints.
> break in subtle ways when they handle characters that span multiple codepoints

Or equivalently: there is more than one way to turn a string into a list. It can e.g. be a sequence of bytes, unicode chars or grapheme clusters. Being explicit about the conversion is therefore a good idea.

Don't forget splitting on word boundaries and/or whitespace - going from a string of text to an iterable collection of words (strings).
Or for the case of (e.g.) domain names, splitting on dots. Generally, given a collection of split chars, breaking the string into a collection of substrings.
Not if the "iterating over character" function iterates over actual characters and not codepoints.
You mean grapheme clusters? Swift is the only language I know that uses that by default, and you still wouldn't want to store strings as a list of grapheme clusters.
I believe Perl 6 does so as well, see e.g. https://perl6advent.wordpress.com/2015/12/07/day-7-unicode-p...
The apple dev documentation has a nice overview of some of the concerns that need to be taken into account for this to work:

https://developer.apple.com/library/content/documentation/Co...

It's probably one of the better approaches - but it's still not clear if it (alone) allows a developer that speaks only English to develop a text indexing or editing system that works well across English, Japanese, Arabic, Hangul and Dutch for example.

Elixir as well.
The problem is “actual characters” are an ill-defined term; that could mean either code points or graphemes. See, e.g., http://unicode.org/faq/char_combmark.html
Sure, it is useful. Maybe have a "string to [Char]" function in the stdlib?