| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nswango 99 days ago
	So you think that the letters in the Greek and Cyrillic alphabets which are printed identically to the Latin A should not exist? And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?

2 comments

WalterBright 99 days ago

> So you think that the letters in the Greek and Cyrillic alphabets which are printed identically to the Latin A should not exist?

Yes. Unicode should not be about semantic meaning, it should be about the visual. Like text in a book.

> And, for example, Greek words containing this letter should be encoded with a mix of Latin and Greek characters?

Yup. Consider a printed book. How can you tell if a letter is a Greek letter or a Latin letter?

Those Unicode homonyms are a solution looking for a problem.

link

bawolff 99 days ago

> Yes. Unicode should not be about semantic meaning, it should be about the visual. Like text in a book.

Do you think 1, l and I should be encoded as the same character, or does this logic only extend to characters pesky foreigners use.

link

WalterBright 99 days ago

They are visually distinct to the reader.

link

debazel 99 days ago

That is entirely dependent on the font.

link

Yokohiii 99 days ago

Unicode is about semantics not appearance. If you don't need semantics then use something different.

link

WalterBright 99 days ago

> Unicode is about semantics not appearance.

And that's where it went off the rails into lala land. 'a' can have all kinds of distinct meanings. How are you going to make that work? It's hopeless.

link

Yokohiii 99 days ago

It already works.

Tell me what the problem is and what your proposed solution would be.

link

WalterBright 99 days ago

Infer the meaning from the context.

    a) it's a bullet point
    b) a+b means a is a variable
    c) apple means a means the sound "aaaah"
    d) ape means a means the sound "aye"
    e) 0xa means a means "10"
    f) "a" on my test paper means I did well on it
    g) grade "a" means I bought the good bolts
    h) "achtung" means it's a German "a"

I didn't need 8 different Unicode characters. And so on.

link

Yokohiii 99 days ago

Your trolling is really rock bottom. All this already works fine. Millions of times, each day. Just once a week it fails because someone messed up. Not an issue.

link

Muromec 99 days ago

>Yup. Consider a printed book. How can you tell if a letter is a Greek letter or a Latin letter?

I can absolutely tell Cyrillic k from the lating к and latin u from the Cyrillic и.

>should not be about semantic meaning,

It's always better to be able to preserve more information in a text and not less.

link

WalterBright 99 days ago

> I can absolutely tell Cyrillic k from the lating к and latin u from the Cyrillic и.

They look visually distinct to me. I don't get your point.

> It's always better to be able to preserve more information in a text and not less.

Text should not lose information by printing it and then OCR'ing it.

link

ted_dunning 99 days ago

But these characters only look identical in some fonts. Are you saying that if you change font, some characters in a string should change appearance and others should not?

And what about the round-trip rule?

And ligatures? Aren't those a semantic distinction?

link

WalterBright 99 days ago

> But these characters only look identical in some fonts.

That's a problem with the fonts.

> And what about the round-trip rule?

Print Unicode on paper, then ocr it, and you'll get different Unicode. Oh, and normalization.

> ligatures

Generally an issue with rendering.

> semantic distinction

Unicode isn't about semantics (or shouldn't be). Consider 'a'. It's used for all kinds of meanings.