Hacker News new | ask | show | jobs
by davidzweig 1018 days ago
Oh, snap, we've been working on importing Gutenburg to LR:

https://www.languagereactor.com/m/t_en_-

We're ranking them using the download count, and also this prompt to chatGPT (it's primarily for language learners):

"Is this text engaging and interesting for a modern reader, someone not into fine literature? Rate the text excellent, good, ok or poor. I don't want crusty, flowery, contorted language, talking about buttons and mannerisms and the hue of the sky etc."

Then, we're rewriting the ~1000 most popular books using chatGPT to modernise/simplify the text.

Using some markdown as an internal format, drawing from the gutenberg plain text and html formats, this will go to a github repo shortly.

There's translations, and then, need to look at current best TTS voices.

5 comments

Standard Ebooks has the ability to filter books by reading level.

That seems much better for people trying to learn English.

https://standardebooks.org/

They carefully curate and copy-edit their books, and go for quality over quantity. I think that is probably the right choice. We already have free access to an effectively infinite amount of mediocre content on the internet.

I did check that out. The thing is, the criteria is a little different. This is for learners who need to practice reading English. The scripts surface a lot of material in Gutenberg that otherwise would be time-consuming to find. 1950's science fiction that still reads well, or story books for children. Someone could certainly call these books 'mediocre', but I think that's a bit adjacent to what we're trying to do here. If this idea is worth the effort relative to other sources of material.. still undecided.
Just for fun, here's what happens to Pride and Prejudice:

User: Rewrite and simplify the following text for a modern audience: "It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a wife. However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered as the rightful property of some one or other of their daughters."

ChatGPT: "People generally believe that a rich single guy must be looking for a wife. Even if we don't really know what he's thinking when he moves to a new area, everyone assumes he's up for grabs by one of the local girls."

This is a cool project. I'm learning French and am always on the lookout for sources of books -- and especially audio -- at the right level.

To others questioning the need for rewriting: language changes over the course of a century! It's no problem for me, as a native English speaker, to read English from around the turn of the 20th Century. However, it increases the degree of difficulty for me to try to read French or Spanish from the same time period. Also, to get through an old book I have to learn a bunch of useless words that have fallen into disuse; it's much more effective for me to learn the version of the language that is relevant in today's world.

It looks like the French TTS isn't working properly yet? At least on the couple of titles I tried. The selection of titles you have listed looks great.

If you've already determined the text is interesting to the modern reader, why rewrite?

Wouldn't it make more sense to look for texts that are thematically relevant, but with inaccessible language - and rewrite those?

I still shudder to think how this system will handle something like Shakespeare's sonnets...

It's for students learning English. The top 1000 will be rewritten and be available in addition to the original version, and about another 10,000 available in original only (selected using vocabulary frequency analysis and the chatGPT query).

Midsummer Night's Dream:

THESEUS Now, fair Hippolyta, our nuptial hour Draws on apace; four happy days bring in Another moon: but, O, methinks, how slow This old moon wanes! she lingers my desires, Like to a step-dame or a dowager Long withering out a young man revenue.

HIPPOLYTA Four days will quickly steep themselves in night; Four nights will quickly dream away the time; And then the moon, like to a silver bow New-bent in heaven, shall behold the night Of our solemnities.

Rewritten:

THESEUS Now, dear Hippolyta, our wedding day is approaching quickly. In just four more happy days, a new moon will arrive. But oh, it feels like this current moon is taking forever to disappear! It's holding back my excitement, like an old, slow stepmother or a widow who keeps a young man's money for far too long.

HIPPOLYTA Don't worry, Theseus. Four days will pass by in no time, and then four nights will also quickly slip away. After that, the new moon will shine brightly in the sky, like a freshly bent silver bow, and it will witness the night of our special celebration.

Hmm. Shakespeare is the extreme case. It does make it more accessible. The style is lost. shrug

It's amazing that a decade ago I'd have called you an exceptionally demented individual for doing this, but these days there's so much stupid unconscionable shit going on with AI that it hardly stands out.
floren: In epochs bygone, a mere decade hence, I would have proffered thee the epithet of an egregiously unhinged denizen, were thou to embark upon such an endeavor. However, in these current times, wherein the domain of artificial intelligence doth teem with preposterous and morally indefensible machinations, thy conduct emerges as naught but a faint ember in the vast tapestry of perplexities that now envelop us.

davidzweig: Hark! Methinks 'tis the act of rewriting, not the sifting, that doth wound thy pride most deeply, and affront thee grievously. Set thine eyes upon the instance presented hence, to glean a clearer understanding of our noble mission, one that doth strive to render these splendid volumes more reachable:

Original:

The other person was a man named O’Brien, a member of the Inner Party and holder of some post so important and remote that Winston had only a dim idea of its nature. A momentary hush passed over the group of people round the chairs as they saw the black overalls of an Inner Party member approaching.

O’Brien was a large, burly man with a thick neck and a coarse, humorous, brutal face. In spite of his formidable appearance he had a certain charm of manner. He had a trick of resettling his spectacles on his nose which was curiously disarming—in some indefinable way, curiously civilized. It was a gesture which, if anyone had still thought in such terms, might have recalled an eighteenth-century nobleman offering his snuffbox.

Winston had seen O’Brien perhaps a dozen times in almost as many years. He felt deeply drawn to him, and not solely because he was intrigued by the contrast between O’Brien’s urbane manner and his prize-fighter’s physique. Much more it was because of a secretly held belief—or perhaps not even a belief, merely a hope—that O’Brien’s political orthodoxy was not perfect. Something in his face suggested it irresistibly.

Rewritten:

The other person was a man named O'Brien. He belonged to a group called the Inner Party and had a very important job that Winston didn't fully understand. When people saw O'Brien, they became quiet, because he was a high-ranking member of the Inner Party.

O'Brien was a big, strong man with a thick neck and a rough, funny, and tough-looking face. Despite his intimidating appearance, he had a certain charm in the way he acted. He had a habit of adjusting his glasses on his nose, which was oddly disarming, and in some way, it seemed polite and refined, like an old-fashioned nobleman offering his snuffbox.

Winston had seen O'Brien about a dozen times over the past few years. He felt a strong connection to him, not just because he was curious about the difference between O'Brien's polite behavior and his tough appearance. It was more because Winston secretly hoped that O'Brien wasn't completely loyal to the government's beliefs. There was something about his face that made Winston believe this, even though it was just a feeling.

---> Orwell already has a clear style that I very much admire. The somewhat minor changes here would make for easier parsing by students, without changing the message much. I can't imagine Orwell would have objected.

I think your message is misplaced and pretty rude.

The example doesn't change my mind.

It throws away the flavor of the original text. Why reword the great works of literature into YA-level blandness? Might as well just read the Cliffs Notes at that point. 1984 isn't a particularly challenging work, either, and I think readers do themselves a disfavor if they don't read the original.

Edit: hey I do want to apologize my tone in the first comment though, it was too much. Sorry.

Our software is used by Koreans, Turks, Brazilians, Hungarians etc. that are trying to learn a foreign language (English). If importing (and rewriting some of) Gutenburg was the best use of time for them, well, I'm not sure. If you don't see a difference in difficulty in the two texts, then, I don't think you are able to appreciate the perspective of a language learner.
Well, the filtering prompt could have caused offense to someone that enjoys literature. It was the prompt I used after some iterations. If you go through gutenberg (~70,000 books), much of it is pretty undigestable to a modern reader, never mind a learner. Personally I'm finding a learning curve with chatGPT etc. where you have to relearn to directly ask for what you want.
I don't really understand the purpose here, given that literature is significantly aesthetic by nature. Can't students just read contemporary, accessible books if they want something easier?
Literature: "Anything written in print that's not Twilight." (urban dictionary) :)

Yes, they could.. but we can't put Harry Potter in the public library of texts for copyright reasons, and licensing material is difficult and not fun, I prefer to write software. Users can paste it in themselves, but then there's steps. The center of the project is still a browser extension that runs on Netflix and Youtube, we thus sidestep the copyright issue.