| HN Mirror

People trying to draw this comparison proves making good products is harder than it seems...

The default goal everyone is assuming is spitting out the longest correct sequence possible.

But in reality the mental cost of a wildly wrong prediction is much higher than the mental cost of a slightly wrong one, so what you'd train the model for is sequences of a few words at most being with higher confidence.

Most people can/will tune out slightly wrong words especially as they get a feel for what autocorrect is good and bad at.

If you unleash the full range of tokens GPT 2 can normally output, you'll constantly be blasting out words they didn't expect.

—

The fact your long sequence prediction got better doesn't matter because the UI is autocomplete not "auto-write": they're still expecting to drive, and a smart but noisy copilot is worse than a dumb but lazy one in that case.

I wouldn't be surprised if they trained the model to an effective context window of just a few hundred tokens with that in mind