Hacker News new | ask | show | jobs
by lekevicius 8 days ago
> Known generative-AI crawlers are disallowed in robots.txt. This is a research catalogue assembled from primary sources; it is not training data, and a model fine-tuned on these paragraphs would launder out exactly the part — the citations — that gives the prose its value.

This reads like distaste for LLMs - but generally website reads (and is designed as!) very LLMy.

4 comments

If the About page said who made it, i.e. if someone was putting their reputation on the line, I might be more receptive. But the website has enough LLM design tics to make me suspicious.

It's sad. I come to Hacker News to see cool stuff and when I click on a link and see something obviously put together by an LLM I feel like I've been tricked :(

Fair hit and I should have done that from the start. There is a person behind this and the About page is now updated (https://storiedcolors.com/about). Short version: I'm a technical architect who painted as a kid, stopped for years, and started this to get back into it. I do use AI to draft the entries and I'm not going to pretend otherwise but I check every one against named, non-Wikipedia sources and cut what I cant source. You shouldn't take that on faith so the methodology and the citations are there to check and there's a corrections address when I get something wrong. I totally get the "put together by an LLM" reaction on how it felt. I'd rather try and earn the trust back than argue about it.
Right?! It's a bummer when a nice-looking website is now a red flag. It's become part of my workflow now browsing the web to check the About/Contact page on a website immediately; if there's no real person behind the site, how can it be trusted?
Apologies. Was taken with the names and stories. . . didn't read the about page. Guess my critical thinking was on the fritz. Seriously, learn a lot here and will try to do better.
I actually think “explore Claude’s understanding of colors” is an interesting concept. A lot of fascinating cultural information gets compressed into LLMs.
I think so too. But if that's what it is, that's how it should be presented.
"One color a day, told as it ought to be told: with its provenance, its chemistry, and the people who paid for it in poison." is so Claude it hurts. :'D
Yes. Why the heck do LLM's produce prose like this? It's de facto standard in the narration for all these slop videos drowning YouTube.

No human writes like this, so what is the training material that has taught LLM's that this is the way to write?

They may have used LLMs to design the site but IMHO the content is fine and well-sourced. Example: https://storiedcolors.com/color/blaze-orange/

Even if LLMs were used to help, someone must have spent a lot of time on making it read well. At least that's how it feels like.

Except on that page there's immediately a claim that isn't backed up by any of the citations, eg:

"The hunting-safety effect has been substantial. The non-fatal hunting accident rate in the United States fell substantially over the decades following blaze-orange adoption, with state hunter-safety data consistently identifying the orange mandate as a major contributor to that decline."

None of the sources have any national hunting accident data - there's a single link to data from New York, and nothing that would support the claim that state data "consistently" identifies anything...

Do AI crawlers actually follow robots.txt rules? They didn't care about rules when they trained them on copyrighted works so why would they follow the rules here?