| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jstimpfle 3214 days ago

Well, I could roll my own iterator which encapsulates a string and some position information, but then I'd have to wrap a lot of different operations, like advance, advance by n, compare two iterators by position, test for end position, extract character, extract slice, etc.

And the code would get a lot noiser, while the only advantage I see is graphemes support, which I have never needed so far. (And I hope graphemes are actually designed with a similar sensibility for technical concerns as is UTF-8, where I can simply parse with indexes at the byte level, looking only for ASCII characters, without headaches and with maximum performance.)

As for getting line/character from a byte or codepoint offset, that's no problem if I do the calculation only in case of an error. The alternative would be to do it on each advance, which again means ADT wrapping, thus line noise and slower performance.

1 comments

Avernar 3212 days ago

I'm not advocating that the programmer needs to implement the iterators but that the language/runtime have built in support for them.

As for searching for ASCII, which is prevalent in parsing, the iterator function to find the next specified character can do a low level and fast byte search. That's one of the benefits of UTF-8, searching for ASCII characters is super fast.

You wouldn't have to do the character position on each advance. Just have a beginning of line iterator that's updated every time you see a newline character and on error you do call a function that gives you how many characters between the current position iterator and the start of line iterator.

Working with iterators is no more coplex than working with indexes. But it's the language that needs to provide them.