| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by whistle650 758 days ago

https://chatgpt.com/share/13f553a8-5cff-42a1-be95-4a9d33cd10...

May also be easy to correct a lot of it:

“For better safekeeping, Russia’s $24,000,000 collection of crown jewels, probably the finest array of gems ever assembled at one time,”

3 comments

b112 758 days ago

But are you correcting the OICR or miscorrecting the originals?

I want original text, including misspellings, and original regional / historical spellings, including slang (which may look like another word, but is not, and isn't in a dictionary).

You cannot fix OCR text wirhout lioking at the original.

link

brabel 758 days ago

With the spelling having been fixed, even if imperfectly, you could much more easily search for content and find relevant results, and then go on to look at the originals. What you want is still possible, unless you unreasonably make it a requirement that the transcriptions should be perfect.

link

b112 758 days ago

Proper transcription to digital is to do so with accuracy, not "close enough".

link

DemocracyFTW2 758 days ago

to quote myself, "every interesting data set will have inaccuracies in it"

link

b112 758 days ago

There is a vast difference between a rare, honest mistake, and an attenpt to mitigate them...

vs willingly knowing you are introducing corrections that are ridiculously wrong.

Advocating and being a champion for inaccuracy, really isn't a positive. You should find a new thing to quote about yourself.

link

DemocracyFTW2 757 days ago

This is not what this phrase is about. I came to it working on the structural data of just under 100k Chinese characters. I'd spend hours, days and weeks proofreading and correcting formulas, so your "advocating and being a champion for inaccuracy" doesn't stick. But absent an automated, complete coverage of all records against a known error-free data set, there will likely be a small percentage of errors and dubious cases.

And thanks by the way for the readiness to jump to conclusions and fire a salve of allegations, viz. "willingly", "knowingly", "introducing", "ridiculous"

link

notachatbot1234 758 days ago

"$2¢4,000,000" should be "$204,000,000" rather than ChatGPT's "$24,000,000".

link

djhn 758 days ago

Are you aware of any models that perform as well as an LLM on this task at lower cost?

link

bl4ckneon 758 days ago

Self hosted LLM?

link