_ April 14, 2026

Why AI Governance Is Now a Testing Problem?

This shift is explored in Episode 24 of KJR’s Trusted AI podcast, where KJR ACT General Manager Andrew Hammond sits down with Tony Allen, Executive Director of the Age Check Certification Scheme.

This isn’t just a theoretical conversation. KJR has worked directly with ACCS as part of the Australian Government’s Age Assurance Technology Trial, evaluating real-world technologies, including AI-based systems designed to estimate a user’s age. That experience brings a practical lens to the discussion, grounded in testing how these systems perform under real conditions, not just how they’re designed to behave.

The episode looks back at the latest realities of trusted AI adoption, cutting through the hype to examine what actually worked, what surprised practitioners on the ground, and where risks began to surface. It also looks ahead, highlighting how AI is evolving across regulation, implementation, and everyday use.

Through this conversation, a clear theme emerges: the evolution of AI isn’t just a technical story, it’s operational, regulatory, and deeply human. And for those leading quality engineering and testing across Australia, one thing is becoming clear: AI governance is no longer a compliance exercise, it’s a core testing responsibility.

“Just because something is flagged as AI-enabled… doesn’t necessarily mean that we’re ready for it, or it’s ready for us.” – Andrew Hammond ACT GM KJR

Andrew Hammond

General Manager ACT & NSW

The Illusion of “AI-Enabled”

One of the more telling observations from Tony Allen cuts straight through the noise:

“We are seeing a lot of things described as AI-enabled… when in reality, they’re not.”

This isn’t just a marketing problem, it’s a testing problem.

When products are labelled as AI-driven without actually incorporating adaptive or learning components, it creates confusion in validation strategies. Test teams are left asking:

Are we testing a model or a rule-based system?
Should we expect variability in outputs?
Where does accountability sit when something goes wrong?

For QA leaders, this reinforces the need for clear AI governance definitions within delivery environments. Without that clarity, teams risk applying the wrong testing approaches to the wrong systems.

AI Fails Differently; So Testing Must Evolve

Traditional systems fail predictably. AI systems don’t.

This is illustrated with a simple but powerful example: an AI trained to validate passports may accept an image with a dog’s face, because it was never trained to recognise that as invalid.

This is not a defect in the conventional sense. It’s a limitation of training.

For testing professionals, this shifts the focus away from expected outcomes and toward unexpected behaviour.

What This Means in Practice:

You’re not just validating correctness, you’re probing boundaries
Edge cases are no longer optional, they are essential
Test scenarios must include what the system hasn’t seen before

This is where testing AI systems becomes fundamentally different from traditional software testing.

Data Quality Is the Foundation of Trust

In AI systems, data is not just an input, it is the system. Poor data quality doesn’t result in isolated defects. It creates systemic issues:

Bias in decision-making
Blind spots in edge cases
Inconsistent or misleading outputs

This example highlights a deeper truth: if the model isn’t trained to detect something, it won’t, even if it seems obvious to a human.

For Australian organisations, this raises governance questions that testing teams can’t ignore:

Who is accountable for training data quality?
How is that data validated and refreshed?
Are test datasets representative of real-world complexity?

Software quality assurance must now extend upstream into data assurance.

Automation Bias: The Risk No One Is Testing For

Beyond technical limitations, there’s a human risk that’s harder to detect, automation bias.

AI systems are inherently persuasive. They present outputs with confidence, often reinforcing user assumptions.

“It’s constantly reassuring the user that they’re on the right track.” Tony Allen, Executive Director of the Age Check Certification Scheme

This creates a feedback loop where users:

Trust outputs without sufficient scrutiny
Overestimate the system’s capability
Fail to challenge incorrect results

In high-stakes environments, legal, healthcare, or compliance, this can have serious consequences.

This introduces a new dimension:

How do you test not just the system, but how people interact with it?
How do you design validation strategies that account for over-trust?

This is where AI governance intersects with human behaviour, and where testing must adapt accordingly.

Standards Are Coming And They Will Change Delivery

The emergence of ISO 42001 (AI Management Systems) signals a shift toward structured governance.

The standards typically evolve:

Early adopters implement them
Competitors follow
Procurement begins requiring them
They become industry baseline

“The thing that really kicks it off is when it starts to be specified in procurement.” - Tony Allen, Executive Director of the Age Check Certification Scheme

For those in DevOps, test automation, and delivery leadership, this has real implications:

Governance requirements will become part of CI/CD pipelines
Evidence of compliance will need to be testable and auditable
Quality gates will extend beyond functionality into accountability

This is not a future concern, it’s already starting to appear in procurement conversations across Australia.

DevOps and Test Automation Must Expand

As AI becomes embedded in delivery pipelines, DevOps practices must evolve to support the full AI lifecycle.

This includes:

Continuous validation of model outputs
Monitoring for model drift over time
Embedding governance checks into automated pipelines
Testing under real-world variability, not just controlled conditions

Traditional performance testing also needs to adapt.

It’s no longer just about response times, it’s about:

Consistency of outputs under load
Latency in AI inference (especially for edge and Vision AI systems)
Behaviour when inputs fall outside expected patterns

Without these capabilities, organisations risk deploying systems they don’t fully understand.

Vision AI and the Real World

The rise of Vision AI and assistive technologies is one of the most promising, and risky, developments.

From smart glasses that describe environments to AI-enabled accessibility tools, these systems are interacting directly with the physical world. It highlights the potential, particularly for people who are visually impaired, where these technologies can be transformative in enabling greater independence and real-world exploration.

But with that potential comes responsibility. Testing must now account for:

Environmental variability (lighting, movement, obstructions)
Contextual accuracy (is the system interpreting correctly?)
Safety implications (what happens if it gets it wrong?)

This is where AI governance becomes critical, because the cost of failure is no longer just technical, it’s human.

AI Strategy Without Governance Is a Risk

Many organisations are moving quickly to define their AI strategy, but governance often lags behind.

This gap creates exposure:

Systems deployed without clear accountability
Inconsistent testing approaches
Limited understanding of risk boundaries

AI strategy must be built on a foundation of governance and governance must be validated through testing.

What This Means for QA and Testing Leaders

For test managers, QA leads, and senior practitioners across Australia, the role is evolving.

1. Redefine Quality

Quality now includes:

Trustworthiness
Transparency
Ethical behaviour
Data integrity

2. Treat AI Governance as Testable

Governance is not documentation, it’s something that must be:

Verified
Measured
Continuously monitored

3. Expand Testing Practices

Introduce adversarial testing for AI systems
Validate training and test datasets
Test for edge cases and unknown scenarios

4. Align with Emerging Standards

Standards like ISO 42001 will shape:

Procurement requirements
Delivery expectations
Regulatory compliance

The Shift Ahead

AI is no longer a feature sitting on top of systems; it is embedded within them. It influences decisions, behaviours, and outcomes in ways that are not always visible or predictable. And that changes the role of testing. The challenge is no longer just building AI; it’s building AI that can be trusted.

For KJR and the broader quality engineering community, this presents a defining opportunity: To lead the transition from hype to accountability, and to ensure that as AI scales, quality scales with it.

The implications of AI governance are not confined to a single use case, they’re reshaping how quality, risk, and accountability are managed across industries. Explore how these challenges are playing out in different sectors and what it means for organisations navigating AI adoption at scale.

As AI systems become more embedded in critical decisions, governance must move from principle to practice.
Partner with KJR to ensure your AI is tested, trusted, and ready for the real world.

Frequently Asked Questions (FAQs)

1. Why is AI governance now considered a testing responsibility?

Because governance principles, like fairness, transparency, and accountability, can’t remain theoretical. They must be validated in practice through testing, monitoring, and measurable controls across the AI lifecycle.

2. How is testing AI systems different from testing traditional software?

Traditional systems are deterministic and predictable. AI systems are probabilistic and adaptive. This means testing must focus less on fixed expected outputs and more on edge cases, unexpected behaviours, and how systems perform under uncertainty.

3. What risks arise when products are labelled as “AI-enabled” but aren’t?

Mislabelling creates confusion in testing strategies. Teams may apply AI-specific validation methods to rule-based systems, or miss critical risks in genuine AI systems, leading to gaps in quality, accountability, and governance.

4. Why is data quality so critical in AI testing?

In AI systems, data effectively is the system. Poor-quality or biased data leads to systemic issues, not isolated defects, impacting decision-making, fairness, and reliability at scale.

5. What is automation bias, and why does it matter for testing?

Automation bias is the tendency for users to over-trust AI outputs. It matters because even technically “correct” systems can create risk if users don’t question results, making human interaction a key part of what needs to be tested.

6. How should organisations test for unknown or unexpected AI behaviour?

By introducing adversarial and exploratory testing approaches, deliberately feeding systems inputs they weren’t trained on to uncover blind spots, limitations, and failure modes.

7. What role do standards like ISO 42001 play in AI governance?

They provide structured frameworks for managing AI responsibly. Over time, these standards are likely to become procurement requirements, meaning organisations will need to demonstrate compliance through auditable testing processes.

8. How will DevOps and CI/CD pipelines need to evolve for AI?

Pipelines must expand to include model validation, data quality checks, drift monitoring, and governance controls, ensuring AI systems remain reliable and compliant over time, not just at release.

9. What makes Vision AI and real-world AI systems harder to test?

They operate in unpredictable environments. Testing must account for real-world variability, lighting, movement, context, and safety implications, where failures can directly impact people, not just systems.

10. What happens if organisations develop AI strategy without governance?

They risk deploying systems with unclear accountability, inconsistent testing, and unmanaged risks. Governance provides the structure needed to ensure AI is safe, trustworthy, and aligned with regulatory expectations.

11. How does real-world experience, like the Age Assurance Technology Trial, influence AI testing practices?

It highlights that AI systems often behave differently outside controlled environments. Testing must reflect real-world complexity, diverse users, edge cases, and unpredictable inputs, to ensure systems are truly fit for purpose.

- Case Studies

image of flood map on computer monitor which is sitting on office desk

Case Studies

Local Government Authority ArcGIS Platform

Independent testing assured ArcGIS scalability during peak flood events, giving Council confidence when performance mattered most. Find out how !

external view of high rise corporate building with silverWater Utility letters on the wall

Case Studies

Chatbot Widget Testing for Water Authority

KJR assured a secure, reliable omni-channel chatbot experience, validating MFA identity checks, seamless handoffs, and data integrity – providing smarter, connected, self-service digital interactions. Find out more !

Case Studies

Test Automation Framework for Water Corporation

KJR implemented a test automation framework for a government-owned retail water corporation, delivering faster, more reliable software releases, reduced manual testing, and improved accuracy across complex integrated systems. Read how !

Industrial Pipelines Against Overcast Sky

Case Studies

State Retail Water Corporation

KJR was engaged by a major retail water corporation to test and validate an AI-enabled IVR. End result – delivered flawless deployment, zero defects, proven resilience, and verified transcription accuracy. Find out how !

Case Studies

Local Government

A large Australian Local Government
organisation engaged KJR to plan, execute and report on performance testing for its new corporate website ensuring it could reliably handle peak traffic, particularly during emergencies.

Case Studies

Government owned retail water corporation

A Government owned retail water corporation providing essential services engaged KJR to apply its enterprise software quality engineering expertise to the customer’s Maximo regression testing challenges.

Case Studies

Large Government Health Department

KJR upgraded a government health department’s LoadRunner testing platform, resolving performance issues and improving system stability. KJR’s efforts ensured the department could deliver resilient, high-performance healthcare services without disruption.

Case Studies

Government body responsible for running state elections

This large state government statutory authority that runs state, local council and statutory elections, faced a challenge with slow performance when processing returned envelope barcodes. This issue resulting in delays to determining election outcomes. KJR were engaged to deliver performance testing on their Election Management System.

View all case studies

Gallery

Contacts

Why AI Governance Is Now a Testing Problem?

The Illusion of “AI-Enabled”

AI Fails Differently; So Testing Must Evolve

Data Quality Is the Foundation of Trust

Automation Bias: The Risk No One Is Testing For

Standards Are Coming And They Will Change Delivery

DevOps and Test Automation Must Expand

Vision AI and the Real World

AI Strategy Without Governance Is a Risk