Hacker News new | ask | show | jobs
by pmelendez 4704 days ago
> "having non ASCII characters"

Accented vowels are ASCII characters but in the extended set which people sometimes don't take account.

4 comments

Generally not. On the Internet, ASCII generally means ANSI_X3.4-1968, a 7-bit standard with 128 code points. (Run "man ascii" on a Unix system to see this.) There aren't any accented characters.

By contrast, there were national variants of ISO/IEC 646 (also a 7-bit character set, and essentially the internationalized version of ASCII) that included accented characters within those 128 code points. Generally these swapped out things like the at-sign (@) and the curly braces and vertical pipe character for accented vowels instead.

There were also lots of 8-bit character sets in ISO/IEC 8859 (e.g. Latin-1, or ISO/IEC 8859 part 1) that included accented characters within the "extended" set of code points 128-255.

There are a number of different "extended set" (IBM code pages and ISO/IEC 8859 parts for instance), and they're "extended" because they're not ASCII but supersets of it (as is UTF-8).

ASCII is the 7-bit encoding ANSI_X3.4-1968, composed of 95 printable and 33 control characters.

Or they account wrongly ;) (ie: from the wrong set)

I love how sometimes even on the same company, each place account ASCII differently.

I remember registering for a IM, and in one info screen my name was Maur&cio and in the site info screen Maur€cio and in the search screen was Maur£cio and so on...

ASCII is Obviously meant to use CP437 for character codes 128-254, duh...

Seriously though, most other code pages are pretty transparent to/from unicode... IBM PC-DOS extended ascii (classic ANSI-BBS) isn't so transparent.

Extended ASCII is not ASCII. There is ASCII, which has no accented characters, and there are other character encodings based on ASCII, which often do. Those other character encodings are not ASCII.