| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by xudongz 4778 days ago
	Not true for Go http://play.golang.org/p/ls96RxJxpz

2 comments

nknighthb 4778 days ago

I would be reluctant to rely on this until the Go documentation is clearer on the intended behavior. Right now it's very poorly specified. The regex doc[1] talks about "same general syntax" as Perl, but points to [2], which doesn't seem to understand what it's saying, describing '\d' in terms of its "Perl" meaning, but then saying that it's [0-9].

[1] http://golang.org/pkg/regexp/

[2] https://code.google.com/p/re2/wiki/Syntax

link

laumars 4777 days ago

As a Perl developer that's been making the switch to Go, I've been caught out a few times with Go's no-so-Perl-like regular expression syntax. In fact I wish I knew about your 2nd link before now, because that could have saved me a few hours over recent months.

link

jamesmiller5 4777 days ago

Considering go's vocal support for UTF8 I'm surprised at this behavior and curious to the reason for excluding it.

link

masklinn 4777 days ago

Supporting UTF8 and correctly handling unicode are very, very different beasts. The former is absolutely trivial, the latter is extremely difficult.

Go is vocal about the former, but seems to not give a shit about the latter.

link

snogglethorpe 4777 days ago

I'm not particularly fond of go, but "correctly handling unicode" can be subjective and case-dependent... I think making only minimal guarantees and punting to the application is often the only sane course.

link

masklinn 4777 days ago

> "correctly handling unicode" can be subjective and case-dependent...

So is correctly handling integers.

> I think making only minimal guarantees and punting to the application is often the only sane course.

That is completely and utterly crazy, the average developer has neither the knowledge nor the resources to make anything but a mess out of it without proper tools and APIs. Even with these (including a complete implementation of the unicode standard and its technical reports) unicode is already complex enough to deal with.

link

snogglethorpe 4777 days ago

Of course it's not "completely and utterly crazy."

Not every app needs to deal with the enormous complexities implied by "full unicode support", and given the huge cost of that, there's a real place for a minimalist approach. If all I do with unicode is input strings from the user, store them in a database, and then later spit them out, I don't need to be able to do Turkish case-conversion, and I may not want to pay the cost of making it possible.

Certainly tools and APIs help for those cases where an app needs to do the sort of complicated text-processing that warrants "full" unicode support, but it's not at all clear that the proper place for such support is in the base language libraries. It's quite reasonable for the language implementors to say "if you want to do X, we'll support that, but if you want to do Y and Z, please use external library L."

link

masklinn 4777 days ago

> Not every app needs to deal with the enormous complexities implied by "full unicode support", and given the huge cost of that, there's a real place for a minimalist approach.

Not sure what point you're trying to make, I never said all applications had to make full use of all possible Unicode APIs, I said the language must expose them. Because if it doesn't, those who should use them will never become aware of them let alone use them.

> If all I do with unicode is input strings from the user, store them in a database, and then later spit them out, I don't need to be able to do Turkish case-conversion, and I may not want to pay the cost of making it possible.

So?

> It's quite reasonable for the language implementors to say "if you want to add numbers, we'll support that, but if you want to subtract or divide them, please use external library L."

Really?

Then again, considering Go's embedded contempt for non-US locales (see: datetime patterns) I'm not even sure why we're having this discussion, and since it's obvious they don't care for a non-US world it make sense that they wouldn't care for processing text.

And at the end of the day, you agree that Go has no provision for unicode handling, you just think it's all fine and dandy.

link