Hacker News new | ask | show | jobs
by unaindz 1163 days ago
Even if the original images have a mix of languages I think the tagging is all done in english (I may be wrong). I would argue that the source material includes the tagging as it is necessary for the AI to get trained so the content is not really mixed but entirely english.

But anyways the danbooru tags consist of things like: short hair, blue eyes, portrait. Things that are much more easier to translate (or "understand") in several languages than entire phrases like GPT does.

1 comments

Yeah, the danbooru tagging is done in english. However, if the art is sourced from places like Pixiv, those sites do tagging in the site's native language. My point is that the original content was in a mix of languages, but the process of tagging and training normalized it all into english and results in a situation where even the people who authored the original art will now pay more to use the resulting networks if billed per-token unless they learn English. So we're basically taking all this input from various cultures, Englishifying it, and then potentially billing them more if they want to keep using their native tongue. Kind of sad.