Hacker News new | ask | show | jobs
by jaimie 2896 days ago
This is a fun exercise. Back in 1963, Fred Mosteller and David L. Wallace wrote a piece in the Journal of the American Statistical Association titled "Inference in an Authorship Problem: A comparative study of discrimination methods applied to the authorship of the disputed Federalist papers" [0]. It describes another technique for analyzing the authorship using a Bayesian model of word distributions.

One interesting thing about this is the claim that there is a ground truth for all but 12 of the papers, meaning that supervised learning could also be used.

For discussion, I often think that unsupervised methods are preferred to supervised methods, given a reasonably low error rate by the unsupervised method, as it will be able to generalize more readily.

[0] https://www.jstor.org/stable/2283270