| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by adrianN 3297 days ago
	But iterating over the characters is a very useful feature. It's used in all string algorithms that I can remember in the time it takes to write this comment.

4 comments

runeks 3297 days ago

But iteration is not a property of a list, but of the Foldable type class. A list is just one implementation of something that's Foldable/iterable. You can easily implement Foldable for custom text types.

link

coldtea 3297 days ago

>But iterating over the characters is a very useful feature. It's used in all string algorithms that I can remember in the time it takes to write this comment.

And you still do that with atomic strings -- you just either add a helper method like charAt(i) that gives you access to each character (or, rather, each rune), or you have some way to turn a string of length N into a list of N strings of length one.

link

johncolanduoni 3297 days ago

And those string algorithms likely break in subtle ways when they handle characters that span multiple codepoints.

link

SideburnsOfDoom 3297 days ago

> break in subtle ways when they handle characters that span multiple codepoints

Or equivalently: there is more than one way to turn a string into a list. It can e.g. be a sequence of bytes, unicode chars or grapheme clusters. Being explicit about the conversion is therefore a good idea.

link

e12e 3296 days ago

Don't forget splitting on word boundaries and/or whitespace - going from a string of text to an iterable collection of words (strings).

link

SideburnsOfDoom 3296 days ago

Or for the case of (e.g.) domain names, splitting on dots. Generally, given a collection of split chars, breaking the string into a collection of substrings.

link

adrianN 3297 days ago

Not if the "iterating over character" function iterates over actual characters and not codepoints.

link

johncolanduoni 3297 days ago

You mean grapheme clusters? Swift is the only language I know that uses that by default, and you still wouldn't want to store strings as a list of grapheme clusters.

link

cannam 3297 days ago

I believe Perl 6 does so as well, see e.g. https://perl6advent.wordpress.com/2015/12/07/day-7-unicode-p...

link

e12e 3296 days ago

The apple dev documentation has a nice overview of some of the concerns that need to be taken into account for this to work:

https://developer.apple.com/library/content/documentation/Co...

It's probably one of the better approaches - but it's still not clear if it (alone) allows a developer that speaks only English to develop a text indexing or editing system that works well across English, Japanese, Arabic, Hangul and Dutch for example.

link

jswny 3296 days ago

Elixir as well.

link

dragonwriter 3296 days ago

The problem is “actual characters” are an ill-defined term; that could mean either code points or graphemes. See, e.g., http://unicode.org/faq/char_combmark.html

link

pvdebbe 3297 days ago

Sure, it is useful. Maybe have a "string to [Char]" function in the stdlib?

link