Hacker News new | ask | show | jobs
by romellem 1166 days ago
This is spoken to [in the full complaint][1]. The data scientist was told Frank really did have 4 million users, and the scientist only needed to generate this "synthetic data" as a way to "anonymize" their "real" data. I.e. the scientist was duped:

  JAVICE told Scientist-1 [...] that she had a database of approximately 4 million
  people and wanted to create a database of anonymized data that mirrored the
  statistical properties of the original database (the “Synthetic Data Set”).
  
  [After JAVICE sends Scientist-1 the data], Scientist-1 understood that the data
  available via the Access Link Email -
  **a data set of approximately 142,000 people** (emphasis added) -
  was a random sample of a larger database which contained data for approximately
  4 million people. In fact, that data represented every Frank user who had at
  least started a FAFSA.
[1]: https://www.justice.gov/usao-sdny/press-release/file/1577861...