Most CPS 230 programs can survive a steering committee. Far fewer can survive a real outage.
CPS 230 took effect on 1 July 2025.
For pre-existing service-provider contracts, the catch-up deadline is the earlier of renewal or 1 July 2026.
The real question now is no longer “Do we have a framework?” It is “Can engineering prove it under pressure?”
That is the gap many institutions are still carrying.
Most CPS 230 commentary is written for Boards, CROs, resilience teams, and operational risk functions. That makes sense. The standard is governance-heavy by design.
But governance is only half the story.
The quality of a CPS 230 program is now limited by the quality of the engineering evidence underneath it. Risk teams can often describe the framework well. Engineering teams still cannot always show how a critical operation maps to systems, what recovery target really applies, which incidents trigger APRA notification, or what evidence a Board can safely rely on.
That is where the 5-star version of CPS 230 starts to crack.
APRA’s CPS 230 Operational Risk Management requires entities to identify critical operations, set tolerance levels, maintain credible business continuity capability, manage material service providers, and test under severe-but-plausible conditions. APRA’s CPG 230 guidance makes the technical translation even clearer: the maximum period of disruption maps naturally to RTO, and the maximum extent of data loss maps naturally to RPO.
That means CPS 230 is not only a governance uplift.
It is an engineering operating model.
The 5-star CPS 230 trap
A 5-star CPS 230 program is one where nothing obviously bad has happened yet.
There is a policy. There is a committee. There is a Board pack. People feel moderately reassured.
But if a director, regulator, or crisis lead asks four simple questions, the confidence starts to wobble:
- What is the actual RTO for this critical operation?
- What is the actual RPO?
- Which incidents trigger the 72-hour APRA clock?
- Which failures push the business outside tolerance and into the 24-hour clock?
That is why the dangerous CPS 230 program is the one that looks finished in PowerPoint.
A 10-star CPS 230 program feels different. It feels like the hard conversation has already been rehearsed. Risk can ask the question. Engineering can show the map. Operations can explain the fallback. The Board can see the evidence.
That is the standard worth aiming for in a regulated environment.
The engineering translation that actually matters
The most common CPS 230 failure pattern is that tolerance levels stay in business language for too long.
They need to become technical commitments.
Here is the translation that matters:
| CPS 230 concept | Engineering translation | Evidence that should exist |
|---|---|---|
| Critical operation | Service and dependency map | Application inventory, runbooks, owner map |
| Maximum period of disruption | RTO | Failover and restore test results |
| Maximum extent of data loss | RPO | Backup, replication, and replay evidence |
| Minimum service levels | Degraded-service target | Manual fallback and partial-service runbooks |
| Material service provider | Dependency and concentration view | Contract controls, access rights, exit and contingency evidence |
| Operational risk incident | Regulatory decision tree | 72-hour and 24-hour escalation workflow |
This is where many institutions still have a credibility gap.
The Board may have approved a tolerance level. But if engineering cannot show the service map, the recovery path, the restore timing, the manual fallback, and the supplier dependency chain underneath that statement, then the tolerance is still more aspiration than operating fact.
That is why the right question for risk officers is no longer “Do we have a CPS 230 framework?”
It is “Can engineering prove we stay within tolerance, or recover predictably when we do not?”
What 10-star looks like in practice
The move from 5 stars to 10 is not about writing more. It is about making resilience legible under pressure.
| Area | 5-star posture | 10-star posture |
|---|---|---|
| Critical operations | Listed in policy | Mapped to real systems, people, suppliers, and fallback paths |
| Tolerance levels | Approved in principle | Translated into RTO, RPO, and degraded-service targets |
| Incident handling | Severity labels only | Severity plus regulatory trigger logic |
| Scenario testing | DR exercise theatre | Severe-but-plausible failure rehearsal |
| Board oversight | Narrative reassurance | Evidence pack with gaps, proof, and remediation status |
In a 10-star CPS 230 program:
- risk and engineering share one vocabulary
- incident responders know when they are inside or outside tolerance
- service-provider dependency is visible before a crisis, not during one
- Board confidence is earned through current evidence, not presentation quality
That is a much higher bar. It is also a much safer one.
The pathway in one view
If you want one visual that explains the whole article, it is this:
- Identify the critical operation.
- Translate tolerance into RTO, RPO, and degraded-service expectations.
- Decide whether an incident is material, outside tolerance, or both.
- Test the ugly scenarios, including supplier failure.
- Turn the results into evidence a Board can challenge and rely on.
That is the pathway from policy comfort to operational proof.
The 90-day engineering framework
This is the fastest useful way to close the gap without turning the program into another documentation factory.
Days 1-30: Build the critical operations recovery map
The first month is about forcing specificity.
For each critical operation, engineering, architecture, operations, and risk should produce one working map that shows:
- the applications and platforms involved
- the key databases and data stores
- upstream and downstream integrations
- infrastructure and network dependencies
- identity and access dependencies
- manual fallback steps
- internal support teams
- third-party and fourth-party dependencies
This is where hidden weaknesses appear. Many institutions have a business continuity plan. Far fewer have a trustworthy recovery map showing what actually has to recover, in what order, and with what minimum viable capability.
At the same time, translate tolerance levels into technical targets.
If a critical operation can only be unavailable for two hours, engineering should be able to point to the architecture, staffing model, failover pattern, and runbook discipline that make a two-hour RTO credible. If only minimal data loss is acceptable, the backup and replication design should support that RPO in practice, not only in a policy statement.
By day 30, you want:
- a critical operations register connected to real systems and suppliers
- an RTO and RPO view for every critical operation
- defined degraded-service expectations
- a gap list where current capability does not support approved tolerance
If the gap list is uncomfortable, good. That means you finally have the real problem on the table.
Days 31-60: Rebuild incident classification around regulatory triggers
This is where CPS 230 becomes operationally sharp.
Most engineering teams already classify incidents by severity. But those taxonomies are usually built for restoration speed, not prudential reporting.
CPS 230 adds a second lens: regulatory materiality.
Under the standard, an entity must notify APRA as soon as possible, and no later than 72 hours, after becoming aware of an operational risk incident that is likely to have a material financial impact or a material impact on its ability to maintain critical operations. It must notify APRA as soon as possible, and no later than 24 hours, if it suffers a disruption to a critical operation outside tolerance.
That means responders need more than Sev 1 and major incident.
They need a decision path that answers:
- Is a critical operation affected?
- Is the operation outside tolerance?
- Is the likely impact material financially or operationally?
- Is this a near miss or control failure that should change our risk view even if APRA notice is not triggered?
This is also the right window to test severe-but-plausible scenarios, which APRA expects in practice. Basic disaster recovery theatre is not enough.
The better test set includes:
- cloud control-plane outage
- identity provider failure
- corrupted restore
- managed service provider loss
- cyber containment that forces manual operations
- a dependency failure that leaves the platform “up” while the critical operation is unusable
By day 60, you want:
- incident severity mapped to CPS 230 notification logic
- named owners for 72-hour and 24-hour assessments
- evidence capture built into incident response
- scenario results with remediation owners and deadlines
- near misses feeding the operational risk profile instead of disappearing into folklore
If your incident process cannot separate “major outage” from “outside tolerance”, your program is still too theoretical.
Days 61-90: Build the Board evidence pack
This is where the program either becomes defensible or starts to wobble.
CPS 230 does not create a standalone engineering attestation. But it absolutely creates a Board evidence problem.
Boards are expected to approve tolerance levels, oversee resilience, review failures to remain within tolerance, and understand the risk created by material service providers. In parallel, broader annual declarations under CPS 220 Risk Management still depend on management being able to show the framework is working in reality.
That means engineering evidence is no longer optional support material. It is part of what makes Board reliance reasonable.
By day 90, the evidence pack should include:
- the critical operations register and supporting service maps
- approved tolerance levels translated into RTO, RPO, and degraded-service targets
- current recovery capability versus required tolerance
- severe-but-plausible scenario results
- incidents and near misses relevant to critical operations
- remediation status for missed targets and failed controls
- material service-provider dependencies, concentration risks, and contingency options
- backup, restore, failover, and manual-operation evidence
This is the level where Board challenge gets better.
Instead of asking whether the institution “has” a resilience framework, directors can ask:
- Which critical operations are closest to breaching tolerance?
- Which suppliers create the hardest recovery dependency?
- Where is RTO or RPO least credible against approved tolerance?
- Which repeat incidents suggest control weakness, not bad luck?
- What evidence supports management confidence today, not six months ago?
That is a much healthier conversation.
The 10-star test for CPS 230
If you want a simple quality gate, use this:
- In under 10 seconds, a risk leader should be able to see the critical operation, the owner, the tolerance, and the current concern.
- In under 10 minutes, engineering should be able to show the dependency map, recovery path, and most recent test evidence.
- In under an hour, the institution should know whether it is dealing with a major incident, a 72-hour APRA notification scenario, or a 24-hour outside-tolerance event.
- By the next Board or executive review, the evidence pack should show what failed, what changed, who owns the gap, and when it will be retested.
That is what a confidence-building CPS 230 operating model looks like.
The tactical checklist engineering teams actually need
If you want the engineering version in one view, start here:
- Map every critical operation to applications, data stores, integrations, infrastructure, people, and third parties.
- Translate each tolerance level into RTO, RPO, and degraded-service expectations.
- Identify single points of failure across identity, data, networking, and service providers.
- Add incident logic for material impact, outside tolerance, and APRA escalation.
- Capture near misses and repeat control failures, not only customer-visible incidents.
- Test cloud, identity, restore, cyber, and provider-loss scenarios that would genuinely hurt.
- Prove backup, restore, and failover timing against the approved target, not the vendor brochure.
- Review material service-provider contracts for service levels, APRA access, exit rights, subcontracting, and contingency value.
- Track remediation to closure with named owners, dates, and retest evidence.
- Package the outputs in a Board-ready format that can support oversight and annual declarations.
What the 30 April 2026 amendments do and do not change
APRA’s 30 April 2026 targeted amendments update matters, but less than some teams hope.
The change created a narrow exemption where strict contractual terms may be impracticable for certain non-traditional service providers. That is useful at the margin.
It does not remove the core engineering burden.
You still need to know which providers matter, how concentration risk behaves, how manual or alternative arrangements work, and what happens when a provider becomes unavailable at exactly the wrong time.
In other words, the amendment may soften one contracting edge case. It does not soften the resilience test.
The real point of CPS 230
The point of CPS 230 is not better language.
It is better resilience.
Not a calmer policy library. Not a prettier dashboard. Not a stronger committee rhythm.
Actual resilience.
The firms that do well here will not be the ones with the prettiest policy suite.
They will be the ones that can show, with evidence, how engineering operationalises resilience inside approved tolerance.
If you are a risk officer, CIO, engineering leader, or operational resilience owner, where is your program still 5-star today: recovery mapping, incident classification, supplier dependency, or Board proof?