Hacker News new | ask | show | jobs
by chromoblob 1056 days ago
Also, auditing any big dataset given verbatim is practically impossible for now. Instead of including the dataset verbatim, the model that purports to be practically-usefully open-source should contain a relatively small procedure for deriving the dataset from some reputable general-purpose dataset, small enough that the resulting dataset can practically be audited.