|
The pposts above give a general high level view of most ABS publications and data sets that are released to the public/institutions, with some minor deviation depending on context. Yes, we separated out and destroyed name and address data after processing was finished for the census. Indeed, most stats, data and calculations are released to the public at various levels of statistical aggregation, not at unit record level. Almost all ABS data sets treat privacy and naive identifiability far more seriously than most lay people understand, and we had a variety of techniques to test for such, and ways to change the underlying data to produce good higher level stats while destroying the underlying unit record level/value. Census is a little bit different: there is a pre-processing/pre-consumable data set stage where various techniques and research is taking place to try to produce various improvements and information from the raw data to produce a finished product and think about its future use. Nothing nefarious when I was doing it, think ocr, error correction and analysis, value distribution, imputation, comparisons to dress rehearsal and calibration. Removal of the string "fuck off census collector" and every swear and curse word and smart arse answer imaginable from every field, as a fun example the public don't think much about... Security at the time I was working on it was OK. Better than a lot of private industry. There was some network access (this shouldn't be surprising, it's an e census, it's a tautology there's some network access to bits of the data somewhere in the entire process) but ideally not to the wider internet outside of specific bits of the internal ABS network once the raw data is in. Only specific people working with such data are given access at this stage and in this way. Nothing is perfect, I can't stress that enough in the area of IT security and my particular area of experience and statistical analysis. A mature analyst (so, not tabloid headlines) understands that all security and identifiability is based on risk/reward/work/information trade offs: how much can we change things while keeping the analysis valuable, and pushing the amount of work required to gain access to the information that remains to be greater than the effort required to do so and the likely information/ability an attacker has. I can't speak for conditions after I left, or how this census has been run, and I believe I've been honest without giving anything important away or painting them in an inaccurate light. I won't make a comment here about the general question of name address collection, suffice to say that on a personal note, possibly because of my profession/ability, I'm closer to the tin foil hat end of the spectrum than most of the population, and posting this even makes me uncomfortable... |
You and me both! I only work with data for machines and reliability, so I can only guess at what nefarious stuff might happen with this data.
But I think it's important that society has people worrying about information security. Many (most?) people just don't care.