Patterns Aren't Proof

The database query finished at 2:47 AM. I was sitting in the Public Procurement and Disposal of Public Assets Authority office in Kampala, watching 100,000 contracts load into Uganda's Government Procurement Portal. After eighteen months of building, we were live.

What I didn't expect was what the data would teach me over the next three years.

1. Four patterns that surface at scale

When you process 100,000 contracts, you stop seeing individual documents. You start seeing patterns that suggest behaviors worth investigating.

100,000+

contracts analyzed in Uganda's Government Procurement Portal

847

contracts in one ministry with amendments averaging 34% above original value

Threshold clustering. Uganda's procurement rules require additional oversight above certain value thresholds. When I analyzed contract values across three years, I found clustering just below those thresholds. Not randomly distributed. Concentrated at specific price points.

That clustering might indicate threshold manipulation. Or it might reflect how officers estimate costs, rounding to familiar numbers. The pattern doesn't prove wrongdoing. It identifies where to look.

Timing anomalies. Timestamps reveal approval patterns that narrative records hide. Direct awards that cluster on Fridays. Emergency procurements that spike before holidays. Contracts signed in the final week of budget cycles.

The calendar raises questions. Were these rushed decisions dressed as emergencies? Or did legitimate workload pressures push approvals to week's end? The pattern warrants investigation. It doesn't answer its own question.

Network signatures. Companies that win contracts sometimes share phone numbers, email domains, or registered addresses. I found clusters of "different" suppliers listing the same accountant, the same director, or adjacent suite numbers in the same building.

Cross-referencing bidder registration data surfaces beneficial ownership connections that individual contract reviews miss. But shared service providers aren't illegal. Common addresses might indicate a business district, not a shell company. The pattern flags relationships for verification. Investigators determine whether those relationships violate procurement rules.

Amendment gaps. The difference between initial contract values and final payments tells a story. Across 847 contracts in one ministry, amendments averaged 34% above original values. Vague descriptions like "additional works" appeared in 60% of amendment justifications without specifics.

But amendments happen for legitimate reasons: scope changes, material cost increases, unforeseen site conditions. The gap indicates where auditors should request documentation. It doesn't tell you which amendments were justified.

2. What the data doesn't show

Three years of running this system taught me its limits.

Published data captures documented procurement. It misses projects that never entered the system: verbal agreements, emergency allocations that bypassed process, contracts signed but never uploaded.

In one ministry, I compared budget disbursements against recorded procurements over fiscal year 2021-22. The gap suggested 30% of spending happened outside formal procurement. But that estimate carried uncertainty. Some disbursements might have been miscategorized. Some procurements might have been recorded in systems I couldn't access. The portal showed what was recorded, not what occurred.

Data shows what happened, not why. A contract awarded to a new company isn't automatically suspicious. It might be the first women-owned business to compete in that sector. A pricing anomaly might reflect market conditions, not manipulation. Context lives outside the dataset.

The most dangerous interpretation is pattern-matching without ground-truthing. I flagged one award because the data fit a corruption pattern: new company, sole-source justification, above-market pricing. An auditor investigated. The company was a legitimate new entrant. The pricing reflected supply chain disruptions. The pattern was real. The conclusion was wrong.

Disclosure creates accountability only when someone acts on it. I tracked portal visits from January 2022 through December 2023. 80% came from Kampala, mostly journalists and civil society organizations. The districts where projects happened, where citizens could verify whether the road was actually built, showed minimal access.

Data availability isn't data use. I'd built infrastructure for accountability that the people closest to the projects couldn't reach.

3. What I'd build differently

Start with investigation workflows, not schema. I spent months mapping OC4IDS fields before talking to oversight bodies. When I finally met the Auditor General's office, they wanted cross-ministry comparisons I hadn't structured for. They wanted to see which entities consistently exceeded amendment thresholds. They wanted supplier performance across years.

The schema describes what data can look like. The investigation use case defines what queries the system must support. Start with the queries. Interview the Auditor General, the Inspector General, the parliamentary oversight committee. Ask: "What question would you answer first if you had all procurement data?" Build for that question.

Build validation into ingestion. I accepted data as submitted, then tried to clean it. Wrong sequence.

Validation at entry would have caught errors before they became published facts. The rules I'd implement now:

Bidder IDs must match registered companies in the national business registry
Contract amounts must fall within 20% of budget allocation for that line item
Dates must be sequential (tender published before bid deadline before award before signature)
Required fields reject null values rather than publishing incomplete records

I use Great Expectations for data validation pipelines now. Define expectations as code, run them on every ingestion batch, block records that fail critical checks, flag records that fail warning checks for manual review.

Cleaning after publication means correcting the public record. Validating before publication means protecting it.

Design for mobile and offline from day one. The portal looked good on desktop. But district officials checking contract compliance used phones with intermittent connectivity.

I retrofitted a mobile view eighteen months in. By then, I'd trained users on a desktop interface they couldn't access in the field.

What I'd do now: Progressive Web App architecture with offline-first design. Cache critical lookup data (supplier registry, contract summaries) locally. Queue verification reports for sync when connectivity returns. Use responsive frameworks like Tailwind that make mobile the default viewport, not an adaptation.

4. The distinction that matters

After 100,000 contracts: disclosure is not transparency, and patterns are not proof.

Patterns are not proof. A 34% amendment average is a triage signal, not a verdict. Every red flag needs contextual verification before action.

Disclosure is publishing data. Transparency happens when someone verifies what the patterns suggest. When a procurement officer checks the portal before approving a suspicious vendor. When a citizen matches a contract to a construction site and finds the quantities don't match. When an auditor investigates an anomaly the system surfaced and confirms whether it reflects error, coincidence, or misconduct.

The portal was infrastructure. Transparency was what people built on top of it.

I've seen disclosure systems with 50,000 contracts generate zero accountability outcomes because nobody investigated the anomalies. I've seen a single published dataset trigger an investigation that recovered millions in misallocated funds, because someone did the ground-truthing.

The difference wasn't the data. It was whether anyone verified it.

Build for investigation, not just publication.

Playbook

Decision Table

Option	When to Use	Tradeoff
Low threshold, broad alerts	Early exploration and unknown risk landscape	High recall, heavy reviewer load
High threshold, narrow alerts	Limited reviewer capacity	Higher precision, misses weaker patterns
Tiered threshold with SLA	Operational monitoring with escalation levels	More setup work, better throughput control

Execution Checklist

Define precision and false-positive tolerance before deployment.
Separate signal generation from final risk judgment.
Assign owner and response SLA per threshold band.
Recalibrate rules monthly with labeled outcomes.

Failure Modes

Thresholds are copied from another country context without recalibration.
Review teams cannot keep up with alert volume.
Flags are used as final proof without contextual verification.

ProcurementTransparencyOc4idsImplementation

Found this useful?

I write about open data systems, transparency, and implementation.