Foreign key constraints enforce referential integrity in the AI4SH database — a record can only reference another record that already exists. This means the order in which you import data matters: referenced tables must be populated before referencing tables.

The general rule

Import data in this global order:

  1. Utility catalogues (no foreign keys among themselves, except the dependent ones)
  2. Dependent utility catalogues (reference other utility tables)
  3. Dataset metadata (reference utility tables)
  4. Observation records (reference dataset metadata and utility tables)

Utility dependency tree

Within the utility catalogues, most tables are independent. The following have foreign key requirements:

classification_order
    └── classification_family
            └── classification_genus

quantity
    └── indicator

setting_system
    └── juxtaposition

unit
    └── profiling

apparatus + provider + method_tier
    └── provision
            └── provision_indicator (also requires indicator)

The translate_utility_data.txt pilot file is already ordered to respect these dependencies. Independent tables appear first; dependent tables follow.

Dataset metadata dependency tree

(utility.territory)
data_source
person
    ↓
dataset  ←── (observation.license from utility)
    ↓
campaign  ←── person
    ↓
sampling_log  ←── campaign, person
    ↓
sample  ←── sampling_log
    ↓
observation_log  ←── sample, provision (utility)
    ↓
measurement  ←── observation_log, provision_indicator (utility)

What happens if order is wrong

If a manage process runs before its dependency is in the database, the framework will report a foreign key constraint violation error for that record. The record is not inserted. Subsequent records that do not have the same unresolved dependency will still be attempted.

To recover:

  1. Fix the pilot file order
  2. Set "overwrite": false in the scheme file to skip already-inserted records
  3. Re-run the manage cell — only the previously failed records will be inserted

Using the foreign_key.xlsx file

The file ./AI4SH/import_data/excel_src_data/utility/foreign_key.xlsx is a reference sheet mapping columns in source Excel files to their foreign key targets in the database. Use it to verify that the values in your source data match existing records in the referenced tables before running the translate step.

Common issues:

  • A unit_id value in indicator.xlsx does not exist in the unit table yet
  • A provider_id value in provision.xlsx references a provider not yet imported
  • A dataset_id in campaign.xlsx references a dataset not yet in the database

In all cases, ensure the referenced record is imported first, then re-run the translate and manage steps for the dependent table.

Overwrite and delete flags

The scheme file controls how the framework handles existing records:

"process": [
  {
    "execute": true,
    "overwrite": false,
    "delete": false
  }
]
  • "overwrite": false — skip records that already exist (safe default for re-runs)
  • "overwrite": true — update existing records with new values
  • "delete": true — delete existing records before inserting (use with caution)

Updated: