If you were working with code that was proprietary, you probably shouldn't of been using cloud hosted LLMs anyways, but this would seem to seal the deal.
The funniest part is that in that contraction the first apostrophe does denote the elision of a vowel, but the second one doesn’t, the vowel is still there! So you end up with something like [nʔəv], much like as if you had—hold the rotten vegetables, please—“shouldn’t of” followed by a vowel.
Really, it’s funny watching from the outside and waiting for English to finally stop holding it in and get itself some sort of spelling reform to meaningfully move in a phonetic direction. My amateur impression, though, is that mandatory secondary education has made “correct” spelling such a strong social marker that everybody (not just English-speaking countries) is essentially stuck with whatever they have at the moment. In which case, my condolences to English speakers, your history really did work out in an unfortunate way.
Here's an example of a phonemic orthography, which is somewhat readable (to me) but illustrates how many diacritics you'd need. And it still spells the vowel in "ask" or "lot" with the same ä! https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
> A phonetic respelling would destroy the languages, because there are too many dialects without matching pronunciations.
Not only that, but since pronunciation tends to diverge over time, it will create a never-ending spelling-pronunciation drift where the same words won't be pronounced the same in, e.g. 100-200 years, which will result in future generations effectively losing easy access to the prior knowledge.
> since pronunciation tends to diverge over time, it will create a never-ending spelling-pronunciation drift
Once you switch to a phonetic respelling this is no longer a frequent problem. It does not happen, or at least happens very rarely with existing phonetic languages such as Turkish.
In the rare event that the pronunciation of a sound changes in time, the spelling doesn't have to change. You just pronounce the same letter differently.
If it's more than one sound, well, then you have a problem. But it happens in today's non-phonetic English as well (such as "gost" -> "ghost", or more recently "popped corn" -> "popcorn").
English also shows a remarkable variation in pronunciation of words even for a single person. I don't know of any other language where, even in careful formal speech, words can just change pronunciation drastically based on emphasis. For example, the indefinite article "a" can be pronounced as either [ə] (schwa, for the weak form) or "ay" (strong form). "the" can be "thə" or "thee". Similar things happen with "an", "can", "and", "than", "that" and many, many other such words.
We had a spelling reform or two already, they were unfortunately stupid, eg doubt has never had the b pronounced in English.
https://en.m.wiktionary.org/wiki/doubt
That said, phonetic spelling reform would of course privilege the phonemes as spoken by whoever happens to be most powerful or prestigious at the time (after all, the only way it could possibly stick is if it's pushed by the sufficiently powerful), and would itself fall out of date eventually anyway.
Even though the vowel "a" is dropped from the spelling, if you actually say it out loud, you do pronounce a vowel sound when you get to that spot in the word, something like "shouldn'tuv", whereas the "o" in "not" is dropped from both the spelling and the pronounciation.
For example, in Year 1 that useless letter "c" would be dropped to be [replased](replaced) either by "k" or "s", and likewise "x" would no longer be part of the alphabet.
It becomes quite useful in the later sentences as more and more reformations are applied.
English is rather complex phonologically. Lots of vowels for starters, and if we're talking about American English these include the rather rare R-colored vowels - but even without them things are pretty crowded, e.g. /æ/ vs /ɑ/ vs /ʌ/ ("cat" vs "cart" vs "cut") is just one big WTF to anyone whose language has a single "a-like" phoneme, which is most of them. Consonants have some weirdness as well - e.g. a retroflex approximant for a primary rhotic is fairly rare, and pervasive non-sibilant coronals ("th") are also somewhat unusual.
There are certainly languages with even more spoken complexity - e.g. 4+ consonant clusters like "vzdr" typical of Slavic - but even so spoken English is not that easy to learn to understand, and very hard to learn to speak without a noticeable accent.
You never realize how many weird rules, weird exceptions, ambiguities, and complete redundancies there are in this language until you try to teach English, which will also probably teach you a bunch of terms and concepts you've never heard of. Know what a gerund is? Then there's things we don't even think about that challenge even advanced foreign learners, like when you use which articles: the/a.
English popularity was solely and exclusively driven by its use as a lingua franca. As times change, so too will the language we speak.
The thing is that English takes in words from other languages and keeps doing so, which means that there are several phonetic systems in use already. It's just that they use the same alphabet so you can't tell which one applies to which word.
There are occasional mixed horrors like "ptarmigan", which is a Gaelic word which was Romanized using Greek phonology, so it has the same silent p as "pterodactyl".
There's no academy of the English language anyway, so there's nobody to make such a change. And as others have said, the accent variation is pretty huge.
IIUC one reason is that prompts and other data sent to 3rd party LLM hosts have the chance to be funneled to 4th party RLHF platforms, e.g. Sagemaker, Mechanical Turks, etc. So a random gig worker could be reading a .env file the intern uploaded.
What do you mean by chance? It's clear that if users have not opted out from training the models, it would be used. If they have opted out, it wont be used. And most of the users are in first bucket.
Just because training on data is opt out doesn't mean business can't trust it. Not the best for user's privacy though.
I think it's fair to question how proprietary your data is.
Like there's the algorithm by which a hedge fund is doing algorithmic trading, they'd be insane to take the risk. Then there's the code for a video game, it's proprietary, but competitors don't benefit substantially from an illicit copy. You ship the compiled artifacts to everyone, so the logic isn't that secret. Copies of the similar source code have linked before with no significant effects.
Most (all?) hedge funds that use AI models explicitly run in-house. People do use commercial LLMs, but in cases where the LLMs are not run in-house, it's against the company policy to upload any proprietary information (and generally this is logged and policed).
A lot of the use is fairly mundane and basically replaces junior analysts. E.g. it's digesting and summarizing the insane amounts of research that is produced. I could ask an intern to summarize the analysis on platinum prices over the last week, and it'll take them a day. Alternatively, I can feed in all the analysis that banks produce to an LLM and have it done immediately. The data fed in is not a trade secret really, and neither is the output. What I do with the results is where the interesting things happen.
AFAIK, the actual trading algorithms themselves aren’t usually that far from what you can find in a textbook, their efficacy is mostly dictated by market conditions and the performance characteristics of the implementation / system as a whole.
Many algo strategies are indeed programmatically simple (e.g. use some sort of moving average), but the parametrization and how it's used is the secret sauce and you don't want that information to leak. They might be tuned to exploit a certain market behavior, and you want to keep this secret since other people targeting this same behavior will make your edge go away. The edge can be something purely statistical or it can be a specific timing window that you found, etc.
It's a bit like saying that a Formula 1 engine is not that far from what you'd find in a textbook. While it's true that it shares a lot of properties with a generic ICE, the edge comes from a lot of proprietary research that teams treat as secret and definitely don't want competitors to find out.