Hacker News new | ask | show | jobs
by DanBC 5228 days ago
Gently disturbed that you're releasing medical data, albeit anonymised medical data.

What ethics panel did you run it through? What internal policies do you have governing release of such data?

4 comments

The site has no privacy policy, as far as I'm concerned, that means buyer-beware.

HOWEVER... A comment by cmonsen (founder??) on the original-OP states "User privacy will be critical and we are making that a priority." (1)

So..ya, a bit of #fail here.

(1) https://news.ycombinator.com/item?id=3603466

Privacy policy posted. It's in the footer of symcat.com. Thanks for your patience and let us know if you have any feedback!
Apologies for the oversight. Will push it out to the site today.
What is so sensitive in this data to you? These are just aggregated results, why do you want them to run any "ethics panel" to get them released?
You're right -- HIPAA et al only cover data that could be traced back to the individual, and this set clearly cannot.

Interestingly, there was a similar dataset presented at PSB (pacific symposium on biocomputing) of colocalizations of symptoms together with drugs in Bing queries attempting to find novel drug side effects. They too had no problem releasing the data.

I'm glad you made that point. We are very aware of how important user privacy is for this sensitive information. We want to begin the conversation with users while we're young so that we don't mess it up when we're big (a la facebook).

As a general principal, we will only reflect data back that has been fully anonymized. In fact, we don't collect personally identifiable information (that's why there are only 3 choices for age right now). We are building HIPAA-compliant software (even though it's not legally applicable to us yet). We have a team of advisors, including privacy experts, but honestly, we believe the best ethics panel will come from the users and are very interested in feedback in this respect.

> we believe the best ethics panel will come from the users

Not when it comes to HIPAA compliance. This isn't about finding the best ethical code of conduct for privacy (which can be tricky), but simply abiding by existing and well-defined rules; all users agreeing you're a paragon of virtue doesn't matter much if you break said law once it does apply to you.

Please understand I've no wish to rain on your parade; it's just that I know all too well dealing with HIPAA can cause some headaches, but that's part of the game when working in anything connected to healthcare in the US.

That's true, for HIPAA compliance, there is no negotiation and we will meet that standard.

But there inevitably will be some user concerns that fall outside of HIPAA compliance. So, we see HIPAA+HITECH as a minimum requirement. We don't expect it to be sufficient, however, and that's where user feedback, the "user ethics panel" if you will, comes in.

Out of curiosity, what disturbs you? The data isn't even actual medical data, but search data.
This particular data set is fine, but we've seen other people release bigger data sets thinking it was anonymised only to find that it wasn't. (See, for example, the Netflix data dump.)

Most people don't care about their movie rentals, but will be a lot more cautious about some of their medical history.

I'm in the UK. Rules here are pretty strict. Mostly that's a good thing; you run your intended research by a research review panel, and if it needs ethics approval you do that too. The benefits of that are that people get help from a real mathematician early in the project design so they should be getting the stats and the sample sizes etc right.

Like I said, I'm only gently concerned. And I'm sure they'll get this right.