Hacker News new | ask | show | jobs
by gcr 529 days ago
Decoding random gibberish into semantically meaningful sentences is fascinating.

It's really fun to see what happens when you feed the model keysmash! Each part of the input space seems highly semantically meaningful.

Here's a few decompressions of short strings (in base64):

    $ ./ts_sms.exe d -F base64 sAbC
    Functional improvements of the wva
    $ ./ts_sms.exe d -F base64 aBcDefGh
    In the Case of Detained Van Vliet {#
    $ ./ts_sms.exe d -F base64 yolo9000
    Give the best tendering
    $ ./ts_sms.exe d -F base64 elonMuskSuckss=
    As a result, there are safety mandates on radium-based medical devices
    $ ./ts_sms.exe d -F base64 trump4Prezident=
    Order Fostering Actions Supported in May

    In our yellow
    $ ./ts_sms.exe d -F base64 harris4Prezident=
    Colleges Beto O'Rourke voted with Cher ¡La
    $ ./ts_sms.exe d -F base64 obama4Prezident=
    2018 AFC Champions League activity televised live on Telegram:

    $ ./ts_sms.exe d -F base64 hunter2=
    All contact and birthday parties

    $ ./ts_sms.exe d -F base64 'correctHorseBatteryStaples='
    ---
    author:
    - Stefano Vezzalini
    - Paolo Di Rio
    - Petros Maev
    - Chris Copi
    - Andreas Smit
    bibliography:

    $ ./ts_sms.exe d -F base64 'https//news/ycombinator/com/item/id/42517035'

    Allergen-specific Tregs or Treg used in cancer immunotherapy.
    Tregs are a critical feature of immunotherapies for cancer. Our previous 
    studies indicated a role of Tregs in multiple
    cancers such as breast, liver, prostate, lung, renal and pancreatitis. Ten years ago, most clinical studies were positi
    ve, and zero percent response rates

    $ ./ts_sms.exe d -F base64 'helloWorld='
    US Internal Revenue Service (IRS) seized $1.6 billion worth of bitcoin and

In terms of compressions, set phrases are pretty short:

    $ ./ts_sms.exe c -F base64 'I love you'
    G5eY
    $ ./ts_sms.exe c -F base64 'Happy Birthday'
    6C+g
Common mutations lead to much shorter output than uncommon mutations / typos, as expected:

    $ ./ts_sms.exe c -F base64 'one in the hand is worth two in the bush'
    Y+ox+lmtc++G
    $ ./ts_sms.exe c -F base64 'One in the hand is worth two in the bush'
    kC4Y5cUJgL3s
    $ ./ts_sms.exe c -F base64 'One in the hand is worth two in the bush.'
    kC4Y5cUJgL3b
    $ ./ts_sms.exe c -F base64 'One in the hand .is worth two in the bush.'
    kC4Y5c+urSDmrod4
Note that the correct version of this idiom is a couple bits shorter:

    $ ./ts_sms.exe c -F base64 'A bird in the hand is worth two in the bush.'
    ERdNZC0WYw==
Slight corruptions at different points lead to wildly different (but meaningful) output:

    $ ./ts_sms.exe d -F base64 FRdNZC0WYw==
    Dionis Ellison

    Dionis Ellison is an American film director,
    $ ./ts_sms.exe d -F base64 ERcNZC0WYw==
    A preliminary assessment of an endodontic periapical fluor
    $ ./ts_sms.exe d -F base64 ERdNYC0WYw==
    A bird in the hand and love of the divine
    $ ./ts_sms.exe d -F base64 ERdNZC1WYw==
    A bird in the hand is worth thinking about
    $ ./ts_sms.exe d -F base64 ERdNZD0WYw==
    A bird in the hand is nearly as big as the human body
    $ ./ts_sms.exe d -F base64 ERdNZC0wYw==
    A bird in the hand is worth something!
    
    Friday
    $ ./ts_sms.exe d -F base64 ERdNZC0XYw==
    A bird in the hand is worth two studies
1 comments

    $ ./ts_sms.exe d -F base64 elonMuskSuckss=
    As a result, there are safety mandates on radium-based medical devices
LOL!

That "decompression" is reminiscent of Terry Davis of TempleOS fame, who had written a random sentence generator that he interpreted as "speaking to God".