Is Your AI System Ethical? Try This Assessment

The following article was written by Dr. Cornelia C. Walther, a visiting scholar at Wharton and director of global alliance POZE. A humanitarian practitioner who spent over 20 years at the United Nations, Walther’s current research focuses on leveraging AI for social good.

There is a management aphorism so deeply embedded in business culture that it rarely gets questioned: What gets measured gets managed. It is a useful heuristic — until it becomes a trap. For the better part of a decade, organizations have been deploying artificial intelligence at scale while measuring it almost exclusively through the lens of efficiency gains, cost reductions, and revenue lift. The instruments are precise. The picture they produce is radically incomplete. Amid the pervasiveness of AI, this reality patchwork is now amplified.

Existing dashboards do not capture whether an AI system is fair, whether it is eroding or building trust, whether it is making the people who use it more capable or quietly deskilling them, and whether its environmental footprint is accounted for or simply ignored. The gap between what we measure and what we should care about is not a technical failure. It is a values failure dressed up as a metrics problem.

The Prosocial AI Index proposes a practical answer to that failure. It gives executives, technologists, and governance teams a shared vocabulary and a structured scorecard for AI that is genuinely good — not just profitable in the short term, but durable, trustworthy, and aligned with the values an organization actually claims to hold.

What Is the Prosocial AI Index?

The Prosocial AI Index is built on two intersecting axes with four values on each axis. The “4Ts” describe how an AI system is built and deployed. The “4Ps” describe what it is built and deployed for. Together they form a 16-cell matrix — a heat map of accountability that is far harder to game than a single ESG score or a compliance checkbox.

The 4Ts

Tailored: Is the AI system designed for the specific context, culture, and constraints of its users — not copy-pasted from a generic template?
Trained: Is the system built on representative, inclusive data and objectives that encode the values the organization actually wants to promote, not proxy metrics that are merely convenient?
Tested: Is it rigorously evaluated for bias, robustness, and unintended consequences — before deployment and continuously afterwards?
Targeted: Is it applied where AI adds genuine value and withheld — deliberately — where human judgment is irreplaceable?

The 4Ps

Purpose: Does the system advance a mission that all stakeholders can be proud of, beyond the next quarterly cycle?
People: Does it improve the experience, agency, and well-being of everyone who builds, uses, and is affected by it?
Profit: Does it generate durable financial value — not by externalizing costs onto society, but by creating genuine worth?
Planet: Is its energy consumption, materials footprint, and systemic environmental impact accounted for and actively reduced?

The elegance of this structure is that it resists siloing. “Planet” is not the sustainability team’s problem alone — it touches every T. “Tested” is not only a quality-assurance exercise — it has direct implications for every P. Scored honestly, the Index makes it impossible to claim ethical AI while ignoring half the picture.

The Prosocial AI Index opens space to shift from treasuring what we (can) measure, to measuring what we (should) treasure.

Getting Started: A Practical First Step

For a leadership team ready to act now, the entry point is deliberately low friction. Choose one AI system currently in production — a customer-facing chatbot, a hiring screening tool, a demand-forecasting model — and convene a 90-minute cross-functional workshop with representatives from technology, HR, finance, legal, and sustainability. Working through the 16 cells of the matrix together, score each one on a simple traffic light system: green (strong), amber (developing), or red (not compliant).

You do not need a consultant or a new software platform to do this. You need intellectual honesty, a willingness to act on what you find, and the conviction that the organizations that flourish in the algorithmic age will be those that had the wisdom to decide, first, what deserves to be managed — and then build the instruments to match. That 90-minute conversation is where the shift from “treasure what you can measure” to “measure what you should treasure” begins.

Why the Urgency Is Real

McKinsey’s 2025 State of AI report found that while AI adoption has surged to 72% of organizations globally, fewer than a third have implemented systematic governance for the risks their deployments create. Meanwhile, Edelman’s 2024 Trust Barometer documents a widening credibility gap: Worldwide, the public trusts businesses to do the right thing with AI far less than businesses believe they do. That gap is a strategic liability.

The regulatory environment is tightening fast. The EU AI Act, now entering its implementation phase, mandates risk classification and meaningful human oversight for high-risk systems. The NIST AI Risk Management Framework calls for precisely the kind of multi-dimensional assessment the Prosocial AI Index provides. Organizations that treat compliance as the ceiling rather than the floor will remain perpetually reactive — managing scandal rather than building advantage.

There is also an acute talent argument. Research consistently shows that knowledge workers — especially in younger cohorts — choose employers based on perceived ethical integrity. An organization that can articulate and demonstrate how its AI systems perform across all 16 cells of the Index is building a recruitment and retention argument no compensation package can fully replicate.

Four Ways to Put It to Work

The Prosocial AI Index translates aspirations to action. It has four concrete operational uses that any business can act on immediately.

1. Assess new AI systems before deployment

The Index functions as a structured due-diligence instrument, organizing the questions that procurement teams, legal counsel, and ethics officers should be asking vendors and internal developers alike. A system that scores well on “Profit” but poorly on “People” and “Planet” is a liability, not an asset — and the Index makes that visible before a contract is signed.

2. Monitor existing systems over time

Models drift. Data distributions shift. Social contexts evolve. A regular prosocial review — quarterly is a reasonable cadence — forces organizations to treat AI as a living system rather than a one-time deployment. In 2019, research demonstrated how a widely deployed health care algorithm systematically underserved Black patients for years — a failure that a “Tested × People” audit would have surfaced far earlier. What you don’t measure, you cannot fix.

3. Sensitize stakeholders, starting internally

Because every cell of the matrix is intuitive — a CFO grasps “Profit” and “Targeted” as readily as an engineer grasps “Trained” and “Tested” — the Index creates a shared language across functions and hierarchies. The OECD’s AI Principles emphasize that organizational culture, not just policy, determines AI outcomes. The Index is a culture-building instrument as much as a governance tool. It creates shared language to discuss challenges and operate across disciplinary pillars to co-create solutions.

4. Build better from scratch

For teams designing new AI initiatives, the Index works as a design brief. It prompts developers to ask, before a line of code is written: Whose values are encoded in this training set? What problem are we deliberately not solving with this system? What does success look like for the planet, not just the product manager? Embedding the 4T × 4P questions at the inception stage costs virtually nothing. Retrofitting them after deployment costs enormously more.

Organizations that treat compliance as the ceiling rather than the floor will remain perpetually reactive — managing scandal rather than building advantage.

From Return on Investment to Return on Values

The deepest ambition of the Prosocial AI Index is to reframe the dominant question in AI (and general business) strategies. “What is the ROI?” is an incomplete question. Return on investment measures flows of capital. It says nothing about flows of trust, flows of human capability, or the environmental debt being silently accumulated — which will one day be called due by regulators, litigants, and consumers.

Purpose-driven companies with genuine stakeholder orientation outperform their peers over meaningful time horizons. Conversely, the social costs of irresponsible AI — algorithmic discrimination in credit and hiring, labor displacement without retraining, the carbon intensity of large model training — are not externalities in the traditional sense. They are deferred internal costs, repriced when the reckoning arrives.

The Prosocial AI Index does not ask organizations to be less ambitious. It asks them to be ambitious in a more strategic and holistic way. A high-scoring system — tailored to its context, trained on values-aligned data, rigorously tested, deliberately targeted, and evaluated across purpose, people, profit, and planet — is a more resilient one. It compounds trust rather than eroding it, opens regulatory space rather than inviting restriction, and is the kind of AI that employees are proud to build and users are willing to rely on.

Considering rising pressure by consumers and countries, the question is not whether to measure ethical AI. The question is whether you build the instrument before — or after — the damage is done.

Is Your AI System Ethical? Try This Assessment

March 30, 2026 • 8 min read