Why Audits Are the Way Forward for AI Governance

When organizations use algorithms to make decisions, biases built into the underlying data create not just challenges but also engender enormous risk. What should companies do to manage such risks? The way forward is to conduct artificial intelligence (AI) audits, according to this opinion piece by Kartik Hosanagar, a Wharton professor of operations, information and decisions who studies technology and the digital economy. This column is based on ideas from his book, A Human’s Guide to Machine Intelligence.

Much has been written about challenges associated with AI-based decisions. Some documented failures include gender and race biases in recruiting and credit approval software; chatbots that turned racist and driverless cars that fail to recognize stop signs due to adversarial attacks; inaccuracies in predictive models for public health surveillance; and diminished trust because of the difficulty we have interpreting certain machine learning models.

While the value of machine intelligence is obvious, it is now becoming clear that machine decisions also represent a new kind of risk for companies. In addition to the social risks of bias and financial risks of models that confuse correlation with causation, companies have to also account for IT security risk, reputational risk, litigation risk, and even regulatory risk that might arise because of AI-based decisions. Just as we saw with information security, it is a matter of time before boards and CEOs will be held accountable for failures of machine decisions. But what is the C-suite supposed to do? Halting the rollout of AI is not an option.

Cybersecurity risk provides more than a warning; it also offers an answer. Cybersecurity audits have become a norm for companies today, and the responsibility and liability for cyber risk audits goes all the way up to the board of directors. I believe companies using AI models for socially or financially consequential decisions need similar audits as well, and I am not alone.

The Algorithmic Accountability Act, proposed by Democratic lawmakers this past Spring, would, if passed, require that large companies formally evaluate their “high-risk automated decision systems” for accuracy and fairness. EU’s GDPR audit process, while mostly focused on regulating the processing of personal data by companies, also covers some aspects of AI such as a consumer’s right to explanation when companies use algorithms to make automated decisions. While the scope of the right to explanation is relatively narrow, the Information Commissioner’s Office (ICO) in the U.K. has recently invited comments for a proposed AI auditing framework that is much broader in scope.

“High-profile AI failures will reduce consumer trust and only serve to increase future regulatory burdens. These are best avoided through proactive measures today.”

The framework is meant to support ICO’s compliance assessments of companies that use AI for automated decisions. The framework has identified eight AI-specific risk areas such as fairness and transparency, accuracy and security, among others. In addition, it identifies governance and accountability practices including leadership engagement, reporting structures and employee training. ICO’s work is ongoing and it will take some time before any regulatory consensus on an audit framework emerges. But forward-thinking companies should not wait for regulation. High-profile AI failures will reduce consumer trust and only serve to increase future regulatory burdens. These are best avoided through proactive measures today.

An audit process would begin with the creation of an inventory of all machine learning models being employed at a company, the specific uses of such models, names of the developers and business owners of models, and risk ratings — measuring, for example, the social or financial risks that would come into play should a model fail — which, in turn, might help determine the need for an audit. Were a model audit to go forward, it would evaluate the inputs (training data), model, and the outputs of the model. Training data would need to be evaluated for data quality as well as for potential biases hidden in the data.

For example, if a resume screening model is trained on past decisions about which applicants got job offers and which employees got promoted, we would want to ensure that the training data is not affected by unconscious biases of past recruiters and mangers. Model evaluation would involve benchmarking against alternative models, statistical tests to ensure that the model generalizes from the training data to unseen data, and applying state-of-the-art techniques to enable model interpretability.

However effective a model, the inability to understand the factors driving a model’s recommendation can be a major deterrent to managerial and consumer trust in machine decisions. Finally, audits should focus on the typical values of the input data and how the model responds to inputs that are outside the range seen in the training data. For example, if a stock trading algorithm is trained on data from a period in which markets are relatively stable, how does it respond during wild market swings?

Such an audit process would be harder to do for models from vendors but it is possible to subject vendor models to the same level of scrutiny. Finally, AI audits should be performed by internal or external teams that are independent of the team that built the model. This is important to ensure that models are not audited in the same way that the data scientists who developed the model originally validated them.

“I’d argue that auditors of machine learning models need a hacker’s mentality to identify the different ways in which a model could fail.”

This overall approach has a precedent in Model Risk Management (MRM) processes required of the big banks in the wake of the 2008 financial crisis. MRM is typically focused on validation of traditional statistical models, and over the years has helped some banks detect and correct issues with their models. But the scope of MRM — regression models used by big banks — is relatively narrow. It doesn’t address machine-learning models that are continuously retraining as more data come in. Furthermore, issues such as bias and interpretability are often outside the scope of MRM. Machine learning models require an enhanced approach. I’d argue that auditors of machine learning models need a hacker’s mentality to identify the different ways in which a model could fail.

The governance approach I outlined above has a precedent. One company that grades the maintenance standards of New York City rental buildings based on 311 calls used an audit to test the fairness of its algorithms. A large U.S. bank uses a third-party AI explanability solution to test and debug its AI models.

It is clear that machine learning models that make socially or financially consequential decisions (e.g., recruiting, credit approval, ad targeting, or approval of financial transactions) need multiple lines of defense. Relying exclusively on the statistical tests conducted by the data scientists who develop the model will not suffice. A robust audit process would be the way forward for governing AI decisions.

Author’s Note: Even though machine learning is a subfield of AI, most of the AI in use in corporations tends to be supervised machine learning models trained on large datasets. Accordingly, this article implicitly focuses on the audit of supervised machine learning models.