Hacker News new | ask | show | jobs
by notduncansmith 2439 days ago
> For example, it turns out to be really convenient that strings are simply lists of characters. It means all the list manipulation functions just work on them.

Why is it preferable to couple the internal implementation of strings to the interface “a list of characters”? Also, since “character” can be an imprecise term, what is a character in Bel?

1 comments

> Also, since “character” can be an imprecise term, what is a character in Bel?

Exactly. How is unicode and UTF-8 treated by this language?

While we're down here in the weeds[0]...

A good way to do it IMO, is for each character to be unicode grapheme cluster[2].

0. "This is not a language you can use to program computers, just as the Lisp in the 1960 paper wasn't."[1]

1. https://sep.yimg.com/ty/cdn/paulgraham/bellanguage.txt?t=157...

2. http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Bounda...

> A good way to do it IMO, is for each character to be unicode grapheme cluster[2].

I would agree. However, for the sake of argument,

The width of a grapheme cluster depends on whether something is an emoji or not, which for flags, depends on the current state of the world. The combining characters (u)(k) are valid as one single grapheme cluster (a depiction of the UK's flag), which is not valid if the UK splits up, for example. This is only one single example, there are many others, that demonstrate that there is basically no 'good' representation of 'a character' in a post-UTF8 world.

At this point Bel doesn't even have numbers. A character is axiomatically a character, not some other thing (like a number in a character set encoding). Isn't it premature to talk about character set encoding or even representation yet? A character is a character because it's a character. But I haven't nearly read through these documents so perhaps I'm not understanding something.
As far as I know a "character" is a number in a character set. "A character is a character because it's a character" doesn't make any sense, at least not in the context of a programming language. A character is a character because it maps to a letter or glyph, and that requires a specific encoding.

The Bel spec contains the following:

    2. chars

    A list of all characters. Its elements are of the form (c . b), where
    c is a character and b is its binary representation in the form of a 
    string of \1 and \0 characters. 
In other words, even in Bel a character is still a "number" in a character set encoding. The "binary representation" isn't mentioned as being a specific encoding, which means it's assumed to be UTF08 or ASCII.

Also, Bel does have numbers, apparently implemented as pairs. Go figure.

You're right that I need to review more of the Bel spec (which as you point out does in fact implement numbers).

But I am trying to read this in the spirit in which it's intended, as irreducible axioms. And so I don't think it's true that a character needs a representation from the perspective of Bel.

Consider if pg had approached this more like I would have, and started with integers and not characters as a built-in. Does an integer need a "representation"? At least, in a language that is not designed to execute programs in, like Bel? An implementation (with the additional constraints like width, BCD vs two's complement vs whatever, radix, etc) would need one, but the formal description would not.

Thanks for exploring the subject.

That's actually an interesting point for a Unicode mailing-list, but as long as I can farm grapheme-splitting out to I'm happy.