Five implementations across Uganda, Mozambique, and Malawi. Every one taught me something the documentation missed.
The schema is thorough. The codelists are comprehensive. What you won't find: which decisions will haunt you eighteen months in.
This is the checklist I wish someone had handed me before my first implementation.
1. Before You Touch the Data
☐ Map the system owners
You need data from procurement, finance, project management, sometimes land registries. Each system has a different owner, different access rules, different politics.
I lost three months on the Uganda GPP integration because we didn't realize the finance system required ministerial sign-off for API access. That conversation should have happened in week one.
Action: List every source system. Name the person who controls access. Schedule meetings before you scope the technical work.
☐ Commit to an update frequency
Real-time sounds impressive. It's expensive and fragile. Daily batches are simpler and sufficient for oversight use cases.
One implementation I inherited in East Africa promised real-time updates but built batch infrastructure. The team spent a year retrofitting something that should have been designed from the start.
Action: Write down your frequency commitment. Get sign-off from stakeholders. Design your architecture to match.
☐ Identify your maintainer
If the answer is "a contractor we haven't hired yet," stop. Find that person now. Train them during implementation, not after.
The Malawi system I built was maintained by a junior developer we brought on in month two. By launch, she knew it better than I did. That's the model.
Action: Hire or assign your maintainer before month three. Pair them with your implementation lead for the duration.
☐ Define minimum viable disclosure
You cannot publish everything in version one. Pick the fields that matter to your users.
For oversight bodies, that's usually: contracting parties, contract values, key dates, and amendments. Ship that. Add complexity later.
Action: Interview three target users. Ask what data they need to do their job. Scope version one to those fields only.
2. Data Mapping
This is where teams lose months. Source data in one format, OC4IDS expects another. The gap looks small until you start working.
☐ Solve the identifier problem
OC4IDS wants persistent identifiers for organizations, projects, and contracts. Your source systems use different IDs, or no IDs at all.
I've seen project databases where the "unique identifier" was the project name, with twelve different spellings of the same ministry.
Action: Audit your source identifiers before writing transformation code. Build a reconciliation table that maps source IDs to canonical OC4IDS identifiers. Budget two weeks for this.
☐ Document every codelist mapping
OC4IDS has standard codelists for procurement methods, contract types, project sectors. Your national systems have their own.
Building the Mozambique disclosure platform, I found that "selective tendering" in one ministry's system was "restricted competition" in another's was "limited bidding" in a third's. All meant the same thing.
Action: Create a mapping spreadsheet with three columns: source value, OC4IDS value, justification. Review with legal/procurement staff. You'll need to defend these decisions to auditors.
☐ Clarify every date field
OC4IDS tracks events with dates: tender publication, bid submission deadline, contract signature, project completion. Source systems often have one date field that means different things in different contexts.
I've seen "contract date" mean signature date, approval date, start date, or the date someone entered it into the system.
Action: For each source date field, document what it actually represents. Interview the people who enter the data, not just the system administrators.
☐ Reconcile your amounts
Budget amounts, contract values, payment totals come from different systems. They won't match. OC4IDS wants them in a common currency with clear conversion dates.
Action: Define your currency conversion rules. Document which exchange rate source you use and when conversions happen. Flag records where source amounts conflict by more than 5%.
3. Technical Architecture
☐ Choose your transformation layer
You can transform data on extraction (in the source system), in transit (middleware), or on load (disclosure database).
I've settled on transformation in transit: a dedicated pipeline that reads from sources and writes OC4IDS-formatted data to the disclosure layer. It keeps source systems clean and gives you one place to debug.
Action: Document your transformation architecture. If you're using transit transformation, use a tool like Apache Airflow or Dagster for pipeline orchestration. Keep transformation logic in version-controlled configuration, not hardcoded.
☐ Build audit logs from day one
Mistakes happen. Contract values get entered wrong. Party names get misspelled. When an oversight body asks why a contract value changed, you need an answer.
Action: Log every record change with: timestamp, previous value, new value, user who made the change, reason code. Store logs separately from production data. Retain for minimum five years.
☐ Define your validation strategy
Invalid data will reach your pipeline. Negative contract values. Future dates in historical fields. Organization IDs that don't exist.
I prefer flagging over rejecting. Rejecting loses data. Publishing garbage undermines trust. But flagging requires tooling to review flagged records, or they pile up ignored.
Action: Build a validation dashboard that shows flagged records by error type. Assign an owner to review flags weekly. Set a threshold: if flags exceed 5% of records, pause publication and fix upstream.
☐ Plan for schema evolution
OC4IDS evolves. New fields get added. Codelists change.
Action: Store field mappings in configuration files, not hardcoded. When OC4IDS releases a new version, you update config, not code. Test schema migrations in staging before production.
4. Human Systems
Technical implementations fail for non-technical reasons.
☐ Meet the data entry staff
Not their director. The actual officers who enter procurement data. They control your data quality.
If your disclosure requirements add burden without benefit, they'll find workarounds that corrupt your data.
Action: Visit the procurement office. Watch them work. Ask what makes their job harder. Design your system to reduce their burden, not increase it.
☐ Build a feedback channel
When users find errors in published data, how do they report them? How do reports reach the team that can fix them?
I've seen portals with "contact us" links that went to unmonitored inboxes.
Action: Create a dedicated email or form for data corrections. Route submissions to a ticketing system. Assign an owner. Set a response SLA (I use 72 hours for acknowledgment, two weeks for resolution).
☐ Document your approval workflow
Some data is sensitive during active procurements. Some requires legal review.
One implementation in Central America published bid evaluation scores before the award was announced. The procurement was challenged. The legal review took four months. The minister's office nearly shut down the entire portal.
Action: Map which data elements require approval before publication. Build approval gates into your pipeline. Test the workflow with a sensitive record before going live.
5. The Cost of Skipping This
Every item above is a decision you must make before you write code.
Teams skip these questions, start building, then discover the answers through rework. In my experience, rework costs roughly three times what upfront planning costs. By month six, you're rebuilding infrastructure you thought was finished.
The cost of changing course doubles every month you wait. Answer these questions in week one.
Found this useful?
I write about open data systems, transparency, and implementation.
Read more articles →