Hacker News new | ask | show | jobs
by bilsbie 811 days ago
I wonder if they’ve considered hiring people to write. A lot of people might do it for cheap just to have their imprint on AI.

Or another twist pay people to submit ten years of emails (upload the backup file) or just pay small amounts for works they’ve made. College essays, journals, etc.

7 comments

I have to imagine the valuable training data is domain specific stuff like sales call recordings for specific industries and technical materials about specific topics owned by companies. Surely there is enough public or copyright free general purpose material.
This won't be necessary in future AIs. As AIs will start aligning tokens from all the rich modalities of audio, video, 3D with text so that they can express complex ideas, they will bootstrap in proper language generation.

I don't think college essays, etc would contain anything novel. Future techniques could smoothly interpolate better creating ever-anew wordmud.

I agree with your overall point that an AI which can learn about the world directly won't need eleventy billion documents to learn language generation. Just two comments:

1) Based on how pre-verbal children learn, one nitpick is that I strongly suspect we need to give AI touch and a sense of space in order to truly understand quantity, causality, object permanence, etc.

2) Something that is not a nitpick: even a superhuman multimodal AI wouldn't have direct access to human emotions, sexuality, ideas of natural beauty, etc. I don't think humans have run out of interesting things to say about these ideas.

(In particular, I don't think a superhuman AI is capable of understanding music unless it is directly emulating the biological processes by which humans understand music. The issue is not "logical" - melodies don't actually make sense analytically.)

> I don't think [things created by humans] would contain anything novel.

That's quite a proposition.

Not every essay is created equal. Plus I don't understand what would a new way of combining same words, given llms already have seen trillions of tokens, would achieve. llms could inpaint to arrive at similar texts.
Turnitin will have millions of essays written by students. No doubt they will already be looking at these deal (or getting ready to update their license if it currently doesn't permit it).
They're more interested in eliminating jobs than creating them.
This already happens. I have seen recruiters trying to get domain experts in various fields to write articles for AI training.
LinkedIn built a whole platform inside their platform for doing exactly this. I think you get a badge or something on your profile claiming your an expert on something if you write a couple paragraphs on a topic using the provided prompt.

They're very clear its going into an AI generated article on the topic but you better believe that is also now core training data.

Most companies are hiring for the role of AI Tutor. Some of that is definitely happening.
People will just use ai to write those essays and emails!