Most procurement risk-scoring systems fail at the last mile. They produce a number. Someone reads it. Someone nods. Then nothing moves.

The problem is rarely the risk indicator itself. It is what the system shows the user at the moment they have to decide. A score helps an analyst compare projects. It rarely tells an auditor, an agency head, or an oversight body what to do next. The CoST Data Use Manual v4, published in March 2026 through the CoST guidance hub, made a deliberate design move to address this. It did not abandon scoring. It changed what users see first. Instead of a single risk score, the system shows the project as a coloured band, green, yellow, orange, or red, with the action that band requires. The score still exists underneath. The band sits on top. After a decade of watching dashboards become wallpaper, I think this was the right call.

Decision rule: Use scores for diagnosis. Use bands for action. Show the underlying risk flags so users can see why each project sits where it sits.

Here is why scoring fails as a tool for action, and why banding works, with what we have learned from rolling it out in Kaduna State, Malawi, and Mozambique.

1. The problem with risk scores

What we built in the 2010s: a single overall risk score per project, calculated by rolling up many indicators into one number. The thinking was that auditors want to know which projects are riskiest. A number ranks projects. A ranked list directs limited audit capacity.

What I observed: these single-number scores are good for reading and bad for acting. They tell an auditor which project to look at first. They do not tell anyone what to do. Without an action attached to a score, the score becomes decoration.

I have watched audit offices receive risk-scored project lists every quarter for years. The list circulates. People note that project 47 is high-risk. Nothing changes. By the next quarter the list updates and project 47 is still there, because the score was never tied to a workflow that moved the project from one bucket to another.

The deeper problem is calibration. A score of 73 out of 100 means nothing without a baseline. Is 73 high or low? High compared to what? Last year? Other agencies? Other countries? If the user has to do that math themselves, they will not. They will glance at the number and move on.

Rolling many indicators into one number also hides what caused the score. A project scoring 85 might be triggering five different risk patterns. Or one heavily weighted indicator. Or simply gaps in the data that the system mistook for risk. The auditor who needs to act has no way to look behind the score without a separate analysis. That separate analysis rarely happens.

2. What the v4 manual changed

What we used to assume: better scoring would solve the action problem. More indicators, better weighting, machine learning, anomaly detection.

What v4 chose instead: the manual went the other direction. It defined 18 specific red-flag categories, each one detected from disclosed data or reported by community monitors. Then it grouped projects into four bands: green for zero risks, yellow for one to five, orange for six to ten, and red for more than ten.

The bands are coarser than a 0 to 100 score. That is the point. A green project means the data shows no current red flags. A red project means more than ten flags fired and the project warrants immediate attention. The bands do not replace investigation. They route it.

The 18 categories matter. The published list spans procurement-stage flags, contract-amendment flags, ownership-data flags, and competition flags. Each category is data-detected or community-reported, not algorithmically inferred. An auditor looking at a yellow project can see exactly which two flags fired and decide whether the pattern matters. An auditor looking at a red project can see which 12 flags fired and decide which to investigate first.

The categories I work with most are these nine, listed without ranking. They are representative of the 18-category set, not the complete list; the full taxonomy lives in the v4 manual.

  1. Single-bid award. Only one bidder responded to the tender.
  2. Late contract amendment. Amendments filed close to delivery.
  3. Repeat winners. The same firm wins disproportionately within a sector or agency.
  4. Missing justifications. Required public reason for an action is absent.
  5. Supplier concentration. A small set of firms holds most of the market.
  6. Project delay beyond schedule. Implementation runs past the contracted timeline.
  7. Contract value increase beyond threshold. Amendment raises the contract value past a defined limit.
  8. Missing beneficial ownership data. The real owners of the contracted firm are not declared.
  9. Conflict of interest in funding approval. Approver and beneficiary share an undeclared interest.

This is not new in monitoring theory. It is new in procurement disclosure systems. Most still produce single combined scores. The v4 shift to traffic lights with itemised flags is a deliberate move from prediction to evidence. The manual is published openly through the CoST Infrastructure Transparency Initiative; practitioners working on member-country implementations can read the source rather than my paraphrase.

3. Why bands drive action when scores do not

What auditors are supposed to do: use the most precise tool available.

What auditors actually do: use the tool that survives a Monday morning. The tool that loads on a slow connection. The tool that does not require explanation to a new staff member. The tool whose output produces a defensible action.

A traffic light produces a defensible action. If a project is red, you investigate. If you do not, you have taken on documentary risk. Your office's name appears next to a red project that nobody investigated. That risk creates pressure. The pressure creates action.

A score does not create the same pressure. If a project scores 73, your defence is straightforward: 73 did not pass our threshold. We set the threshold at 80. Move on. The score becomes the alibi for inaction.

In the implementations I have seen, banded indicators created faster conversations about action than risk scores did, even where the underlying data was similar. I am stating this as practitioner observation, not as measured outcome data. If a working group wanted to test it formally, the experiment would compare audit-action rates between two procurement portfolios with comparable risk profiles, one surfaced through banding and one through scoring, over twelve to eighteen months. I have not seen that study run, but I would bet on the banded portfolio producing more investigations triggered.

This is not an argument against measurement rigour. It is an argument that what a risk system shows the user has to fit the decision the user has to make. Scores fit research. Bands fit oversight.

4. The Kaduna proof point

Kaduna State Public Procurement Authority runs the strongest implementation I have seen of this banding approach. As reported in the CoST Data Use Manual v4 (March 2026), the KADPPA portal covers approximately 1,484 projects across NGN 684 billion in disclosed value, spanning 10 sectors and 38 procuring agencies. Project counts and disclosed value continue to grow as more agencies onboard, so the figures readers see on the portal today may exceed these. The structural design is what travels.

The portal generates two report types on demand. The first is an Agency Portfolio Report. The second is a Single Project Audit Dossier per project, a PDF that bundles lifecycle stages, contracts, parties, documents, and compliance flags.

Both reports use the banding logic. The portfolio report shows agency-level distribution across green, yellow, orange, and red. The audit dossier shows project-level flags. An auditor can request the portfolio view, identify agencies with disproportionate red bands, and drill into specific projects. The same data is also visible to the procuring entities themselves, which lets them see how their portfolio compares before an external review hits them.

That last point matters more than the audit angle. Procuring agencies do not want to be embarrassed. Showing them their own banding before anyone else sees it converts oversight from an external imposition into internal management. Agencies that see their own red projects address them. Agencies that wait for the audit to surface red projects defend them. The portal's design choice was to put both audiences on the same data, with the same framing, at the same time. That choice is what produces movement.

Market Competition Health and Transparency Performance scores supplement the banding for users who want a numeric summary. The numbers exist. They do not replace the bands. They give the audiences that prefer numbers an additional view, while the bands remain the primary view for anyone making a decision. Both views draw from the same underlying flags. This is the precise sense in which v4 did not abandon scoring; it kept scoring as a supporting view and made banding the primary one.

5. The honest limitation

The system only works when the underlying data is current and complete. A project missing its contract value, its bidder list, or its amendment history will not trigger flags. Missing data produces a green band. The system reports calm where there is no information.

This is the most dangerous failure mode in any risk-detection system. False negatives from incomplete data are indistinguishable from real negatives. The Malawi IPPI implementation addresses this partly by combining data-detected flags with community-reported flags. A community monitor on the ground can report a flag the portal data does not capture, like missing site work or wrong specifications, and that report enters the same flag system.

Even with community reporting, the gap between what is disclosed and what is happening on a project remains. The traffic-light system is honest about this. Green means no flags fired in available data, not no risk exists. The Use Manual v4 is explicit about the distinction. Where I have seen implementations get into trouble is when downstream users compress the distinction. A board paper that summarises the portfolio as "85% green" is technically correct and structurally misleading. The 85% green refers to projects with no triggered flags, not projects with no risks.

The fix is in how reports present the bands. Any report that aggregates banding has to also disclose the share of projects whose data is incomplete enough to suppress flags. That share goes in the same row as the banding distribution. Without it, banding becomes the new score: a number nobody can usefully look behind.

6. What single flags miss, and what paired patterns catch

Even with the data-completeness limitation acknowledged, banding plus 18 single flags still misses one class of pattern. Single flags miss the strongest signal of all: where two flags occurring together mean something neither flag alone means. The v4 manual calls these composite indicators. I will use the term, but the simpler way to think about it is: paired flag patterns.

The v4 manual names four such pairs. The first combines single-bid awards with high-value contract amendments. The second combines late amendments with significant project delays. The third combines repeated amendments with missing justifications. The fourth combines low competition with supplier concentration.

Each pairing is stronger than its individual flags. A single-bid award alone might be defensible in a sector with few qualified suppliers. A high-value amendment alone might reflect legitimate scope change. The two together suggest the contract was awarded to a preferred supplier and then expanded after the fact, which is a recognised collusion pattern.

Paired-pattern detection is where machine assistance starts earning its place. The 18 single flags can be detected from disclosed data with simple rules. The paired patterns need queries that look at the dataset as a whole. As the data volume grows, doing this by hand becomes impossible. Automating it is straightforward. This is the same pattern I argued for elsewhere on this site about anti-corruption platforms generally: AI belongs behind the workflow, not at the public interface. Paired-pattern detection is rule-based searching across a dataset, not a generative chatbot, which is exactly the back-of-house use of automation that earns its keep.

The v4 manual treats paired patterns as a layer above single flags, not a replacement for them. Auditors see the bands first. They see paired-pattern triggers when the project is yellow or above. They see individual flag detail when they drill in. This three-layer view, bands then paired patterns then individual flags, is what makes the system usable across audiences with different time budgets. A minister gets the banding distribution. A senior auditor gets the paired-pattern triggers. A junior auditor working a specific project gets the full flag list. Each audience sees the layer that matches their decision authority.

7. A worked example

To make the model concrete, here is how a project moves through it.

An infrastructure project enters the portal at procurement-award stage. Disclosed data shows: single bid received, contract awarded for NGN 480 million, sector road construction, agency identified, contractor named, no beneficial-ownership data filed.

At first publication: two flags fire. Single-bid award. Missing beneficial ownership. Two flags places the project in yellow. The portfolio report aggregates this with similar projects. The agency sees its own count of yellow projects. No external audit yet.

Three months later, the contract is amended, value increases to NGN 612 million, no public justification posted. Three new flags fire. Late amendment. Material value increase beyond threshold. Missing amendment justification. The project now carries five flags total. It crosses into orange. The paired pattern "single-bid plus high-value amendment" also fires, raising the priority within the orange band. The audit office's quarterly review now flags the project for review.

Definition

second paired pattern

"repeated amendments plus missing justifications"

If the project then experiences a six-month delay with no progress reporting, two further flags fire and a second paired pattern ("repeated amendments plus missing justifications") triggers. The project crosses into red. Investigation is no longer optional. The audit office's name is now formally attached to the file.

What the auditor sees on opening the project record: the band, the seven flags listed individually, the two paired patterns flagged separately, and the full document trail underneath. They can investigate or escalate, but they cannot claim they did not see the pattern.

8. What this means for new implementations

If you are scoping a procurement risk system in 2026, here is the prescription, in order.

Start with the 18 categories. Do not invent new flag types until you have seen which of the 18 fire most in your context. Most countries find that single-bid awards, missing justifications, and contract amendment patterns dominate. Your first six months of data will tell you which flags carry signal and which are noise.

Use the smallest number of bands that maps cleanly to action. In CoST-style oversight workflows, four is usually enough: monitor, review, investigate, prioritise. Some contexts will need five because of legal escalation thresholds or donor-reporting categories; some may compress to three. The principle, not the number, is what to anchor on. What kills banding systems is not three or five bands; it is bands that do not map to a defined action.

Build the paired patterns in the second year, not the first. Paired patterns need a year of single-flag data to calibrate. The four named pairs in v4 emerged from years of cross-country pattern analysis. Copying them into a new context without that experience to back them up is going through the motions. The pairs that matter in your country might not be the four in the manual.

Disclose data completeness in every report. The single hardest discipline in this work is forcing every dashboard to show what it does not know. Without that, banding lies confidently. With it, banding routes attention.

9. A prediction worth holding me to

I will stake one falsifiable claim on this argument. If the banding-based portals coming online over the next eighteen months across CoST member countries do not show measurably higher auditor-action rates than the score-based portals they replace, then the v4 framing is wrong on the central design choice. I will say so on this site, and revise this article to mark the prediction as failed. The metric I would watch is the share of yellow-or-worse projects that receive a documented audit response within ninety days of band-change, against the equivalent share for score-based portals at comparable thresholds.

The Use Manual v4 is the strongest published guidance I have seen on this. It is not the final word. It is the current best version of an evolving practice. The countries running these systems now are writing the next version through what works and what does not.

I would watch what comes back from KADPPA, Malawi IPPI, and the Mozambique Road Fund implementations over the next year. They are the proof points. Whatever the next manual change is, it will come from there.

Cengkuru Michael is a data specialist at CoST, the Infrastructure Transparency Initiative. He works with member countries on procurement disclosure design and assurance. The CoST Data Use Manual v4 (March 2026) is the canonical reference for the framework discussed in this piece.

comparison infographic for Why 73 means nothing without a baseline
Why 73 means nothing without a baseline
matrix infographic for Four paired flag patterns from CoST Data Use Manual v4
Four paired flag patterns from CoST Data Use Manual v4
timeline infographic for A project moving through bands as flags accumulate
A project moving through bands as flags accumulate

Continue Reading

Training is the real launch

Eight and nine April, Kampala. 143 officers from 59 Ugandan agencies tested the Government Procurement Portal under real workflow pressure. Two days changed how I will design every future rollout.

What 100,000 Contracts Taught Me About Procurement Transparency

After processing 100,000 procurement contracts in Uganda, I learned that disclosure isn't transparency. Here are the patterns the data revealed and what I'd build differently today.

Why Disclosure Portals Die (And How to Build Ones That Don't)

Every infrastructure disclosure portal I have worked with that launched to a press release went dark within three years. The failures follow predictable patterns. Here is what the survivors have in common.

Found this useful?

I write about open data systems, transparency, and implementation.

Read more articles →