Hacker News new | ask | show | jobs
by ldp01 3597 days ago
Yep, that sounds just like what he was describing.

I can see the advantages for legitimate policy development from this new system... But one can only hope they don't let everything get leaked.

2 comments

My recollection was that https://spacetimeresearch.com/ (ex ABS guys setting up a plum contract) did all the data processing of the raw data, providing a representative but anonymous set to ABS masses for reporting.

The real data was locked away (no electronic access) and I recall destroyed after a certain point of time. There were no names attached to the real data, even though the real data was not accessible by any mortal.

The representative set had the property that if you say queried a statistic at the national level it would be correct, but if you queried that statistic on a narrower area it get more "representative" and less applicable to the precise area queried. The means etc would however all pan out.

The data was accessible at 7 granularities of geographic areas. Some of the areas may have only had 100s of residents, so you can see why the data was fuzzed.

We were co-located in a building with the ATO, but there was a strict protocol and security personnel denying any access by non ABS staff. We had no internet access, no floppy drives, (no USB back then from memory), no CD burners no easy way of getting data in or out.

The pposts above give a general high level view of most ABS publications and data sets that are released to the public/institutions, with some minor deviation depending on context.

Yes, we separated out and destroyed name and address data after processing was finished for the census. Indeed, most stats, data and calculations are released to the public at various levels of statistical aggregation, not at unit record level. Almost all ABS data sets treat privacy and naive identifiability far more seriously than most lay people understand, and we had a variety of techniques to test for such, and ways to change the underlying data to produce good higher level stats while destroying the underlying unit record level/value.

Census is a little bit different: there is a pre-processing/pre-consumable data set stage where various techniques and research is taking place to try to produce various improvements and information from the raw data to produce a finished product and think about its future use. Nothing nefarious when I was doing it, think ocr, error correction and analysis, value distribution, imputation, comparisons to dress rehearsal and calibration. Removal of the string "fuck off census collector" and every swear and curse word and smart arse answer imaginable from every field, as a fun example the public don't think much about...

Security at the time I was working on it was OK. Better than a lot of private industry. There was some network access (this shouldn't be surprising, it's an e census, it's a tautology there's some network access to bits of the data somewhere in the entire process) but ideally not to the wider internet outside of specific bits of the internal ABS network once the raw data is in. Only specific people working with such data are given access at this stage and in this way.

Nothing is perfect, I can't stress that enough in the area of IT security and my particular area of experience and statistical analysis. A mature analyst (so, not tabloid headlines) understands that all security and identifiability is based on risk/reward/work/information trade offs: how much can we change things while keeping the analysis valuable, and pushing the amount of work required to gain access to the information that remains to be greater than the effort required to do so and the likely information/ability an attacker has.

I can't speak for conditions after I left, or how this census has been run, and I believe I've been honest without giving anything important away or painting them in an inaccurate light.

I won't make a comment here about the general question of name address collection, suffice to say that on a personal note, possibly because of my profession/ability, I'm closer to the tin foil hat end of the spectrum than most of the population, and posting this even makes me uncomfortable...

> I'm closer to the tin foil hat end of the spectrum than most of the population, ...

You and me both! I only work with data for machines and reliability, so I can only guess at what nefarious stuff might happen with this data.

But I think it's important that society has people worrying about information security. Many (most?) people just don't care.