What This Article Is For

You have infrastructure project data in a government system. You want to publish it in a format that citizens, oversight bodies, and analysts can actually use. That means converting your data to OC4IDS, the Open Contracting for Infrastructure Data Standard.

This article explains how to do that conversion correctly, and what goes wrong when you don't.

What Is OC4IDS?

OC4IDS is a data standard for publishing information about infrastructure projects. It defines a common structure for describing projects, the organizations involved, and the contracting processes that deliver them.

The standard exists because infrastructure data locked in proprietary formats serves no one. A road project in Uganda, a hospital in Honduras, and a bridge in Thailand all have the same basic information: what's being built, who's building it, how much it costs, whether it's on schedule. OC4IDS gives that information a common shape so it can be compared, analyzed, and held accountable across borders and systems.

The standard is maintained by the Open Contracting Partnership and documented at: standard.open-contracting.org/infrastructure

What Is Mapping?

Your source system stores data in its own format. The database tables, field names, and value codes were designed for your specific context. "Project Status" might be stored as "proj_stat" with values like "A" for active and "C" for complete.

OC4IDS expects data in a different format. Project status lives in a field called status with values like implementation and completion.

Mapping is the process of defining how each field in your source system corresponds to a field in OC4IDS. It's a translation table that says: "When our system says proj_stat = 'A', the OC4IDS output should say status = 'implementation'."

A mapping template is typically a spreadsheet with three columns:

Source FieldSource ExampleOC4IDS Target Field
proj_statAstatus
proj_nameKampala Ring Roadtitle
contractor_countryUGparties/address/countryName

This template drives your data transformation pipeline. Get it right, and your published data is accurate and useful. Get it wrong, and you publish garbage that looks valid.

Why Mapping Matters

Bad mapping doesn't cause obvious errors. Your JSON will validate. Your portal will load. Your API will return responses. The corruption is semantic, not syntactic.

Consider what happens when someone maps a budget amount to a contract value field. A $50 million project budget becomes a $50 million contract, even though the actual contract was awarded for $12 million. Now every cost analysis is wrong. Every comparison between budgeted and actual spending is meaningless. The data exists but answers the wrong question.

This isn't hypothetical. It's a pattern that appears in real implementations when mapping is treated as a checkbox exercise rather than a semantic decision.

The five rules below target the areas where mapping most commonly fails.

1. Get the Parties Array Right

The parties array is the most misunderstood structure in OC4IDS. It holds all organizations involved in a project: procuring entities, suppliers, funders, contractors. Each organization appears once, with one or more roles.

Where mapping breaks:

Your source system probably stores organizations in multiple tables. You have a "contractors" table, a "procuring agencies" table, maybe a "funders" table. The instinct is to map each table to a separate part of OC4IDS. This creates duplicate entries when the same organization plays multiple roles.

The correct model:

One organization, one entry, multiple roles. When you reference this organization elsewhere in the schema (in contracts, for example), you reference its ID, not its full details.

Common errors:

  • Creating separate party entries for the same organization with different roles
  • Mapping contractor details directly into contract fields instead of the parties array
  • Confusing party identifier (the organization's registration number) with party id (your internal reference)
  • Mapping to parties/address when you mean project locations

The fix: Before mapping any organization data, list every role that organization plays across your source system. Create one party entry per unique organization. Assign all applicable roles. Reference that party by ID everywhere else.

2. Know Which Identifier Goes Where

OC4IDS has multiple identifier concepts that serve different purposes. Confusing them corrupts your data structure.

The identifiers:

IdentifierWhat It IdentifiesExample
id (project level)The infrastructure projectug-mow-2024-001
contractingProcesses/idA specific procurement within the projectcp-001
parties/idYour reference to an organizationorg-ministry-works
parties/identifier/idThe organization's official registration numberREG-12345-2020
parties/identifier/schemeThe registry that issued the registrationUG-RSB

Where mapping breaks:

Source systems often have a single "ID" column that gets mapped everywhere. A project reference number ends up in contracting process ID fields. An organization's tax ID ends up as the party's internal reference ID.

The fix: Map each identifier to exactly one target field. Document what each identifier represents in your source system. When your source has one ID serving multiple purposes, decide which OC4IDS field it truly belongs in, and generate the others systematically.

3. Separate Budget from Contract Value from Tender Value

OC4IDS tracks money at multiple stages. These are different numbers that answer different questions.

The financial fields:

FieldWhat It Represents
budget/amountHow much was allocated for this project
contractingProcesses/summary/tender/valueThe estimated value in the tender notice
contractingProcesses/summary/contracts/valueThe actual awarded contract amount

Where mapping breaks:

Source systems often have a single "project value" or "cost" field. The mapping person picks a target field that sounds right, and budget figures end up in contract value fields, or tender estimates end up as budgets.

Why this matters:

Infrastructure transparency depends on comparing what was budgeted versus what was tendered versus what was contracted. If you map budget to contract value, that analysis becomes impossible.

The fix: For each financial field in your source system, determine what stage of the project lifecycle it represents. Map it to the OC4IDS field that matches that stage. If your source system only has one number, document which stage it represents and map it to that single field. Leave the others unmapped rather than duplicating.

4. Map Dates to the Right Period

OC4IDS captures dates at multiple levels. Mapping the wrong date to the wrong field breaks timeline analysis.

The date fields:

FieldWhat It Represents
period/startDateWhen the overall project started
period/endDateWhen the overall project ended (or is expected to end)
tender/tenderPeriod/startDateWhen the tender opened for submissions
tender/tenderPeriod/endDateWhen the tender closed for submissions
tender/datePublishedWhen the tender notice was published
contractPeriod/startDateWhen the contract period begins

Where mapping breaks:

Source systems often have fields like "start_date" and "end_date" without specifying what they refer to. Tender publication date and tender opening date are frequently conflated.

The fix: For each date field in your source system, trace what event it actually records. Is it when the project was conceived? When construction started? When the tender was published? When bids could be submitted? Map it to the OC4IDS field that matches that specific event.

5. Use the Right Status Codelist

OC4IDS has multiple status fields, each with its own codelist. Using values from the wrong codelist, or passing through raw source values, breaks interoperability.

The status fields and their codelists:

FieldCodelistValid Values
status (project)projectStatusidentification, preparation, implementation, completion, cancelled
contractingProcesses/summary/statuscontractingProcessStatuspending, active, cancelled, unsuccessful, complete, withdrawn
tender/statustenderStatusplanning, planned, active, cancelled, unsuccessful, complete, withdrawn

Where mapping breaks:

Source systems use whatever status values made sense when they were designed: "Active," "In Progress," "Live," "Ongoing." These don't match OC4IDS codelists. Passing them through directly produces data that fails validation.

Worse, some mappings confuse which status field to target. Project status (where is this infrastructure in its lifecycle?) is different from tender status (what's happening with this procurement?).

The fix: Create a transformation table for each status field. Document how every source value maps to the correct OC4IDS codelist value. Apply this transformation in your ETL pipeline. Never pass raw source values to status fields.

Two Universal Principles

Beyond these five problem areas, two principles apply to every mapping decision:

Unmapped is better than wrong. An empty field is a gap you can fill later. A wrongly mapped field is corruption that poisons every analysis. When uncertain, leave it blank and document your uncertainty.

Verify against the schema. Before finalizing any mapping, open the schema browser at the OC4IDS documentation site. Read the description of your target field. Confirm it matches what your source data actually represents. This takes five minutes. Fixing a mapping error after publication takes months.

The Pattern Behind These Rules

These rules share a common principle: OC4IDS is a semantic standard, not just a syntax standard.

Your JSON can validate perfectly. Every field can have the right data type. Every value can pass automated checks. But if the data answers the wrong question, the standard hasn't been implemented. A budget figure in a contract value field is syntactically valid and semantically useless.

The purpose of OC4IDS is to make infrastructure project information comparable across publishers, analyzable by tools, and useful to citizens, journalists, and oversight bodies who didn't design your source system. That purpose requires semantic precision.

Get the parties array right. Know which identifier goes where. Separate budget from contract value. Map dates to the right period. Use the correct status codelist.

Get these five areas right in your mapping template. Everything else follows.

Found this useful?

I write about open data systems, transparency, and implementation.

Read more articles →