Hacker News new | ask | show | jobs
by NiloCK 55 days ago
To clarify, because a number of posts here sort of suggest the confusion:

the article here isn't about the LLM recognizing works that were in the training data. EG, The Old Man and the Sea off the shelf. It's about pegging the author of novel texts, like, say, some letter written by Hemmingway that gets discovered next week and was never before digitized.

1 comments

Yes, that makes sense. However, unless there's a significant corpus of an author in the training data it won't recognize them. One of the author's that I fed into Claude was a passage from the book Leepike Ridge by ND Wilson. Wilson has written online and in print quite a bit, but Claude couldn't guess the author and guessed that it was a passage from a noir crime novel.

Wilson is a fairly idiosyncratic writer with a distinct style, yet even still Claude couldn't guess correctly from a currently published book.

I suspect that what's going on here (like other's are suggesting in this thread) is that Claude is in some way biased towards certain sets of authors by its training.