Home
[8] ACCOUNTABILITY
[7] INFLUENCE
[5] PRIVACY
[4] CHARITIES
[3] DEMOCRACY
[3] CENSUS
[3] THINKTANKS
[3] WORKER RIGHTS
[2] MEDIA BIAS

open data, ABN, transparency
Rosie Williams, BA (Sociology)
16th Apr 2021
The Missing Piece
Keeping the bastards honest means filling in the missing pieces.

Your task is to solve a puzzle with an infinite number of pieces but you don't know what the puzzle will look like when it's done. No one has ever solved this puzzle on this scale before. But it gets worse, apart from having to figure out what you're actually trying to solve, you don't even have the pieces for this puzzle. You find some pieces online but some are missing or broken. You spend all your time trying to fix the broken pieces. At times you wonder if the effort will break you or if it is just all too hard.

Breakthroughs in solving this puzzle in the past have come largely from intentional leaks of specific infomation from the government to the media. No one has attempted a full scale matching of publicly available information through large scale data integration for the purposes of accountability and transparency.

Despite there being a key to make it all join together with complete accuracy and certainty, that key is not included in important datasets. This article is about that key and why it needs to be included in some primary datasets which would allow not only myself but journalists, researchers and the public to lock the pieces into place and form complete picture of who gives money to who and why.

This is something I've struggled to explain to my very patient Twitter and Patreon for years and felt very despondent over because I felt I could not easily explain what is so challenging about the data and programming aspects of joining administrative datasets at scale in order to give people an accurate picture of the challenges which frustrate me on a daily basis or explain in a palatable way what needs to be done to fix it - for everyone.

But there's a relatively recent story that provides a telling analogy for my goal, and that story revolves around the 2016 census.

When testing the waters for significant policy changes, the government sometimes has to present complex decisions to the public - like in 2015 when the ABS got a public relations firm to run focus groups on using future censuses to link government datasets - opening the door to them being able to trace individuals across datasets from the cradle to the grave in a way that I, or any researcher can only dream of when it comes to tracing the influence of the rich and powerful on government spending. This disparity is key:

While the government is building infrastrcuture to link our personal data from birth to death across federal agencies with the help of the census to form the missing link, the public is left trying to figure out how our taxes are being spent and who is inflencing those outcomes using the equivalent of a horse and cart.

The scale of investment in constructing that panopticon of data while keeping the public blind to improper influence in the policy process through publishing poor quality data on conflicts of interest and donations provides a glimpse of the scope of the challenges I face in tracing influence across the system for the public benefit.

This very contrast is precisely why I was able to understand what was being done with the 2016 census, because both the government and I are linking datasets with sometimes questionable data, it's just that the government is seeking to trace the behaviour of individuals. In contrast, I am tracing the behaviour of politicians and government decision makers.

I figure if the ABS can use a public relations firm to explain to a focus group of average Australians why it needs to store our names and addresses to create a 'gold' standard linkage 'key' between government datasets (about individuals), then I can use the same story to explain why we need to get the government to insert the already existing key they create for their own purposes- the ABN - into open datasets so the public can hold a mirror up to the government and end the information asymmetry.

Most people may only remember that there was a huge technical failure with the last census. In privacy circles, the 2016 census was the target of a major campaign because at the last minute (November 2015), off the back of a month long consultation, the government decided to introduce significant changes to the censuses going forward. On behalf of higher powers, the ABS wanted to use the census in a radical new way which went against its entire history and enabling legislation.

The Australian census had always been about anonymous statistics. Personal information was only used to make sure those anonymous statisics were correct. It was not used to give the personal information we share with the ABS to other agencies- for other uses. The 2016 census put a stop to all this ...privacy. The government thought what if they could join all the datasets from all the agencies together in one big...panopticon?

In the past, the government occasionally joined together data from one portfolio to another (say health and welfare datasets) but this was carried out sparringly and with strong oversight. Most of these efforts attracted little attention until the notorious linkage between ATO and Centrelink data resulted in the robodebt debacle. This provided a very public and harmful display of what happens when relying on questionable data.

Despite whatever intentions government had, data is a lot messier than we imagine and wouldn't you know it, when the government tried to create a unique number given to us on an 'Australian Card' in 1987 to police immigrants and catch welfare and tax cheats, instead of getting what they wanted, it produced the biggest nationwide protests ever seen, threatened to become 'the most divisive social issue at least since the Vietnam War', and instead gave birth to Australia's first Commonwealth privacy legislation, The Privacy Act 1988!

Fast forward a couple of decades and times have changed. Now we all provide social media giants enough personal information to make a government green with envy while their hands are tied when it comes to what it can collect, store and put to use. It was time to re-introduce long-shelved ideas about linking government data but to do this the government had a problem to solve (and I don't just mean getting this past an unsuspecting public), they needed new ways of joining datasets together because the previous ways, stymied by privacy legislation and secrecy provisions made matching one person across multiple datasets only vaguely accurate.

This problem is important because I, or any researcher face the same hurdle when matching organisations across datasets. If there is no accurate way to know 'x' in one dataset refers to 'x' in another dataset and not 'y' then you can't successfully trace the actions or outcomes of 'x' through the financial, economic and political system.

Unlike the linking of data about organisations (and because of past public opposition and the privacy principles it gave birth to) there was no unique code identifying every Australian individual that could join datasets with accuracy high enough to draw research quality conclusions from. So they decided to use the personal information we must provide with our Census survey for this purpose and have been building the infrastructure to support this panopticon ever since.

The point of this story today is not to go over this old ground but to draw the now obvious comparison with efforts to trace organisations (which do not enjoy the same privacy protections as do individuals). Unlike individuals, organisations do have a unique number, an ABN which can be used to easily match them across tenders and grants for example. In an article published today, I used the ABN to join registered lobbyists and their clients (where the ABN hasn't been hidden from public inspection) with the Commonwealth tenders or grants they have received.

I could also identify which of these organisations are registered charities because charities data also contains ABN. Matching across these several datasets was relatively simple because tenders, grants and lobbyist data all contain ABN, though only about one in two lobbyist clients had allowed the Attorney-Generals department who registers lobbyists to publish their ABN.

Where it becomes a much harder challenge is cross referencing this data against other important datasets such as political donations published by the AEC and conflicts of interest data publised by the Parliamentary Departments. Likewise, Parliamentary Expenses Authority data includes only descriptions of items purchased instead of the ABN of the entity doing business with the MP.

In other examples, I recently devoted an enormous amount of time to linking top earning corporations to payments or donations they have made to political parties, their associated entities, political candidates, third parties or campaigners. It's possible I have made errors because while the ATO data includes ABN, the AEC annual returns filed for the aforementioned entities, do not. (these projects are no longer online due to lack of funds to suppor their maintenance)

In fact, descriptions can often be quite open to interpretation and the fact that an underpaid, impoverished human being has to eyeball thousands of receipts because no ABN is supplied in order to tell what kind of an organisation has paid money to an entity that reports to the AEC, or often even who, precisely that organisation is, shows how problematic this lack of a unique identifier is in public records.

Another major issue exists with conflict of interest data which to this day are published solely as PDFs. But even if the data can be extracted, there is no requirement for Members of Parliament to list the ABN of the organisation providing them with a gift. If ABN was required, and it's hard to see why it should not be, gifts from the same entities could easily be listed or tallied and matched with other datasets like grants or tenders awarded to the same organisation.

I have devoted almost a decade to working with open data for the purposes of transparency and accountability but there is only so much civil society researchers can do to work with data that is missing crucial information to overcome data quality issues and provide the value to the public that the government claims is behind its very own open data policies.

No one with any decent income is going to do the monumental amount of tedious work and funding it is beyond the means of most organisations- at least at the rates normally paid for the multi-disciplinary expertise. It should not be such a Sisyphean effort of Herculean proportions. This is information the public is entitled to. It is after all, their money that decisions makers are playing with. It's time to put things right.

The 2014 ABS focus group described the improvement moving data linkage from the current 'bronze' to 'silver' or 'gold':

Gold standard – using name and address information to link records during the Census processing period; - Silver standard – Name is encrypted using a specific electronic key. This key is then used along with personal characteristics to link with; and - Bronze standard – linking only on personal characteristics like age and sex and date of birth, but not name and address. (Colmar Bruton focus group 2014)
[8] ACCOUNTABILITY
[7] INFLUENCE
[5] PRIVACY
[4] CHARITIES
[3] DEMOCRACY
[3] CENSUS
[3] THINKTANKS
[3] WORKER RIGHTS
[2] MEDIA BIAS