Hacker News new | ask | show | jobs
by yarvin9 3591 days ago
A similar design is Proquint, which IPFS uses:

https://www.npmjs.com/package/proquint

Proquint (5 letters per 16 bits) is tighter than Urbit's `@p` (6 letters per 16 bits). The Urbit form was designed for synthetic names and restricts itself to phonemes that sound comfortable and natural to English speakers. (Not to say that English should be the universal language, it's actually a terrible language to make everyone learn, just that it is.)

Word lists work reasonably well, but they're quite bulky and they don't take advantage of the human hardware accelerator for learning new words. When you have a GPU, use it. These kinds of synthetic strings also make great passwords, BTW.

(Disclaimer: Urbit guy here.)

2 comments

If you want to make a more universal phoneme-generator, the basic contours of a nearly-universal [1] phonotactics is as follows:

* Strict CV syllable scheme.

* Atonal

* Consonants distinguished only by voiced/voiceless (Chinese, e.g., doesn't do a voicing distinction, but switching to an aspiration distinction would suffice for them)

* 5 vowels: a, e, i, o, u (actual vowel quality may vary; every language that has at least 5 vowels has these 5 vowels) (some languages, particularly indigenous languages in North America, have 3 or 4 vowels, but the intersection yields too few vowels).

* Consonants are harder to inventory. /p/, /t/, /k/, /m/, /n/ are nearly universal, and /b/, /g/, /d/, /s/, /z/ are also quite common. The IPA /j/ (that's the 'y' in 'ya' for English speakers), /w/ (pronounced as you'd think in English) are pretty common semi-vowels. Maybe /l/, /ʃ/, /ʒ/ as well, should you need more consonants.

That gives you 25-75 plausible syllables, depending on how many consonants you go with.

[1] If you go by least common denominator, you end up with maybe 1 vowel and no consonants (there's no consonant phoneme present in every language IIRC).

Doesn't Lojban try to have pretty easy phonotactics or something? They do have consonant clusters, but I thought they did some kind of study and chose their phonemes and some rules on the basis of things that most languages wouldn't find too difficult.

Edit: not suggesting that Lojban's solution is somehow preferable to your advice, just trying to remember what they did about this issue.

Lojban uses the consonants I gave (sans /j/ and /w/, although these are counted as dipthongs instead), plus /f/, /v/, /x/, /ʔ/, /h/, and /r/, as well as /ə/ for a sixth vowel. The syllable scheme seems to be largely C(C)VC(C), with largely only mixed voiced/unvoiced and geminate consonant clusters prohibited. That said, they do allow for "buffer" vowels in pronunciation to aid speakers who have trouble with consonants (and yet they have a /ə/?).

From what I can tell, CV(C) (with the second consonant usually having some restrictions) is fairly widespread. However, in my personal (purely anecdotal) experience, pronouncing foreign consonant clusters or unfamiliar final consonants is much harder than pronouncing unfamiliar initial consonants or vowels, so I'd be slightly wary of letting the final consonant go too unrestricted.

Proquint example for comparison: "pokak-fijus-zavaz-posuf-bizar-luhuf-kulor-marak".