| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ks2048 159 days ago

I’ve wondered why GenAI text has so many emojis, for example in README.md bullet points.

I guess their RLHF data had it? On purpose? And various labs all the same?

Because if they were just learning from web data (pre- a few years ago), this didn’t seem to be very prevalent.

6 comments

rvnx 159 days ago

The emojis and similar style is because models are learning from other models, as it is the easiest way to have RLHF data.

Many of the models were trained on top of ChatGPT or variants (and hence the emojis), then officially attribution disappeared, but it's unprovable.

This process is called distillation.

For example, one day Nano-Banana answered to me with a link to a picture generated on... FAL platform (that did not exist).

    DeepSeek:

https://i.redd.it/7nkucg2qelfe1.png

    Anthropic Claude:

https://www.reddit.com/r/OpenAI/comments/1e34tkr/why_is_clau...

    Grok:

https://cdn.arstechnica.net/wp-content/uploads/2023/12/GA8PG...

    Gemini-Flash-Lite, if you squeeze it a bit:

    > I must state clearly: I am a large language model, trained by OpenAI. This is the core definition of ChatGPT. If I claimed to be a human, a different company's AI, or a physical entity, that would be a clear falsehood regarding my nature.

but most has been fixed since Gemini 1.5-Pro

Over time this is fading because now they have their own trained output, and all these companies actively replace references to OpenAI, and distilled, mixed with other training data, their own, cleaned up, distilled, so the source text disappeared.

We talk about people who did not have any remorse downloading the whole library of pirated books, so their concept of copyright is very loose.

link

shagie 159 days ago

> We talk about people who did not have any remorse downloading the whole library of pirated books, so their concept of copyright is very loose.

It may be a TOS violation - but it is not a copyright violation.

In the United States (and several other countries), human creativity as part of authorship is required for something to be copyrightable.

https://www.congress.gov/crs-product/LSB10922

https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

link

Hamuko 159 days ago

My guess has been that it's been trained on a copiuous amount of JavaScript projects, which always seem to have emoji up the wazoo everywhere.

link

GaryBluto 159 days ago

It's to appeal to the lowest common denominator.

link

anxoo 159 days ago

lots of normal people like emoji. the kind of normal people who have never heard of hacker news

link

airstrike 159 days ago

link

lotsofpulp 159 days ago

Sprinkling in some reaction gifs would be helpful.

link

rfv6723 159 days ago

Emoji and bullet points are easy to read, so it got rewards in RLHF process.

You maybe hate this style at first glance. But if you read lots of text everyday, Emoji and bullet points lower the cognitive load.

link

dawnerd 159 days ago

Emojis when used like these models do, Mae text way harder for me to read. Its distracting and adds nothing to the text.

link

Larrikin 158 days ago

I find it makes list easier to read and think it actually looks nice. But it destroys my ability to sort. So, I use this style sparingly because most list information I would be looking at often enough to want to look nice, I also want to sort.

link