you ask what i mean about programmer productivity. consider this python code from https://norvig.com/spell-correct.html: def edits1(word):
"All edits that are one edit away from `word`."
letters = 'abcdefghijklmnopqrstuvwxyz'
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [L + R[1:] for L, R in splits if R]
transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
replaces = [L + c + R[1:] for L, R in splits if R for c in letters]
inserts = [L + c + R for L, R in splits for c in letters]
return set(deletes + transposes + replaces + inserts)
in seven lines of code, it computes all the potentially incorrect words that can be made from a given correct word with a single edit. so, for example, for 'antidisestableshmentarianism', it returns a set of 1482 words such as 'antidisestauleshmentarianism', 'antidisestableshmentarianlism', 'antidiseitableshmentarianism', 'antidisestablesjhmentarianism', and 'antidiseptableshmentarianism', totaling 42194 bytes. how would you do this in uxntal?here's another part of norvig's program. this part tabulates the case-smashed frequency of every word in its 6-megabyte training set (which presumably consists only of correctly spelled words): import re
from collections import Counter
def words(text): return re.findall(r'\w+', text.lower())
WORDS = Counter(words(open('big.txt').read()))
this takes about 340 milliseconds one core of on my laptop here, which runs at about 6000 MIPS, so it would take about 34 seconds on a machine running at 60 MIPS, maybe a little longer on the apollo3. there are 32198 distinct words in the training set, totaling 244015 characters; the most common word ('the') occurs 79809 times, and the longest word ('disproportionately') is 18 characters. so plausibly you could represent this hash table without any compression in about 500k, though cpython requires about 70 megabytes. ram compression could plausibly get those 500k down to the 384k the apollo3 has without trying to swap to offchip flashfinding the best correction for a word requiring two corrections like 'slowlyyy' takes 70ms, so plausibly it would take 10 seconds or so on the apollo3. (you could maybe do this in the background in a text editor.) (if it were compiled to efficient code, it would probably be closer to 300 milliseconds on the apollo3, because cpython's interpretive overhead is a factor of about 40.) 'disproportionatelyyy' takes 370ms. here's the rest of the correction code: def P(word, N=sum(WORDS.values())):
"Probability of `word`."
return WORDS[word] / N
def correction(word):
"Most probable spelling correction for word."
return max(candidates(word), key=P)
def candidates(word):
"Generate possible spelling corrections for word."
return (known([word]) or known(edits1(word)) or known(edits2(word)) or [word])
def known(words):
"The subset of `words` that appear in the dictionary of WORDS."
return set(w for w in words if w in WORDS)
def edits2(word):
"All edits that are two edits away from `word`."
return (e2 for e1 in edits1(word) for e2 in edits1(e1))
note that this requires you to have two such `edits1` sets in memory at once, though you could plausibly avoid that problem by tolerating more duplicates (double letters provoke duplicates in deletes, transposes, and replaces)norvig doesn't tell us exactly how long it took him to write the code, but he did it in a single airplane flight, except for some minor bugs which took years to find. more importantly, though, it's very easy code to read, so you can easily understand how it works in order to modify it. and that's the most important factor for programming productivity here are some things in this code that are more difficult to write and much more difficult to read in uxntal: - managing more than 64k of data (uxn's memory addresses are 16 bits) - dynamically allocating lists of things such as the (left, right) tuples in splits - dynamic memory allocation in general - string concatenation - eliminating duplicates from a set of strings - iterating over the words in a text file - generating a sequence of outputs from a sequence of inputs with a filtering predicate and a transformation function [f(x, y) for x, y in xys if p(x, y)] - generating a lazy flat sequence of outputs from a nested loop (return (z for y in f(x) for z in f(y))) - hash tables - incrementally eliminating duplicates from a sequence of candidates that turn out to be valid words (set(w for w in words if w in WORDS)) - counting the number of occurrences of each string in a lazy sequence of strings - floating-point arithmetic (which would be fairly easy to eliminate in this case, but not in many other cases; this deficiency in uxn is especially galling since the apollo3 has fast hardware floating point) - finding the highest-rated item of a lazy sequence of candidates according to some scoring function and all of that is on top of the general readability tax imposed by postfix syntax, where even figuring out which arguments are being passed to which subroutine is a mental challenge and a frequent source of bugs note that these are mostly not deficiencies you can really patch with a library. i didn't mention that the program uses regular expressions, for example, because you can certainly implement regular expressions in uxntal. they're things you probably need to address at the level of language semantics, or virtual machine semantics in the case of the address space problem. and they're not tightly tied to cpython being implemented grossly inefficiently; pypy implements all the necessary semantics, and common lisp and c++ have similar facilities in most cases, though their handling of lazy sequences is a weakness that is particularly important on hardware with limited memory like the apollo3 so that's what i mean when i say that uxn is designed to make easy things hard, rather than making hard things easy you say: > the pain point might be intentional nudges away from making things the designer doesn't like the thing is, i don't really care whether rek and devine think that autocorrecting misspellings is a bad thing to do; i want the computer to be a means of expression for my ideas, indeed for everyone's ideas, not for the ideas of a singular designer. that's the apple walled-garden mindset, and it's anathema to me. and, though i could be wrong about this, i think rek and devine would probably agree |