4 Questions to Improve Transparency in AI

5 Aug 2020

Computer code and glasses

If you’re paying for something with your data, then that makes data a currency.

So it seems New Zealand is at it again when it comes to leadership on the world stage. The government has released an ‘algorithm charter’ that aims to tackle the lack of transparency and potential bias in automated systems.

This is a long time coming, and other countries need to pay attention.

Whilst the document is a great start, its language is somewhat vague and aspirational - perhaps intentionally so, given it is early days for this kind of scheme.

As AI and automated systems become evermore present in our daily lives, with the potential to decide your next job or even the next US president - I don’t think it’s unreasonable to expect companies to be more transparent about how their AI works.

Just as we expect companies to file accounts and keep records, we should ask companies to answer a few basic questions about how their algorithms work. This will give consumers confidence that they’re not subject to unfair bias when decisions that affect them are made by AI.

To make AI more transparent, I’d ask companies for:

A high level description of the type of algorithms in use - most machine learning algorithms are not written from scratch - so just tell us for example whether the algorithm is a Logistic Regression or a Transformer Network etc.
The source of the data used to train the model - did the company collect it themselves, if so, where from? Or did they buy it from a 3rd party, if so, who, and where did they get it from?
The features used to train the model - knowing for example that ‘sex’ or ‘age’ are fed into the model would tell us where to look for potential bias.
The methods, and datasets used to validate their model’s effectiveness - this is critical - a company without a good validation strategy is obviously going to have unconscious bias in their systems.

I’ve picked the questions carefully as I think companies have a right to maintain their trade secrets. The data itself can remain secret, as should the various parameters and other specifics about the algorithm.

While I’m not expecting the average consumer to understand the different between machine learning algorithms and the consequences of different testing strategies, having this information in the public domain would enable journalists and the tech community to better hold big tech companies to account.