The manage observation step inserts the JSON process files created in the import observation step into the AI4SH database. This populates the dataset metadata tables: data sources, persons, datasets, campaigns, and sampling logs.

Notebook cell

In import_ai4sh_data.ipynb, the Manage dataset meta cell runs this step:

job_file = 'import_data/job_manage_dataset_meta.json'

structured_process_D, scheme_params_D = Initiate_process(notebook_path, scheme_file, job_file)

if structured_process_D is not None:
    Run_process(structured_process_D, scheme_params_D)

Job file

File: ./AI4SH/import_data/job_manage_dataset_meta.json

{
  "process": {
    "job_folder": "import_data/manage_data/observation/dataset_meta",
    "process_sub_folder": "process",
    "pilot_file": "manage_dataset_meta.txt"
  }
}

Pilot file

File: ./AI4SH/import_data/manage_data/observation/dataset_meta/manage_dataset_meta.txt

This file lists the absolute paths to the JSON process files from the translate step. Paste the output from the translate cell into this file before running the manage cell.

What gets inserted

Running the manage cell executes all listed JSON process files. Each inserts records into its target table:

Process file Target table Description
process_manage_data_source.json observation.data_source Organisations or individuals providing data
process_manage_person.json observation.person Responsible persons for data collection
process_manage_dataset.json observation.dataset Dataset-level metadata
process_manage_campaign.json observation.campaign Campaign-level metadata
process_manage_sampling_log.json observation.sampling_log Sampling log records

Insertion order

The pilot file must list process files in dependency order. The framework executes them sequentially, and a process that references a foreign key in a table not yet populated will fail. The correct order mirrors the translate order:

  1. process_manage_data_source.json
  2. process_manage_person.json
  3. process_manage_dataset.json
  4. process_manage_campaign.json
  5. process_manage_sampling_log.json

See Foreign key handling for more detail on managing dependency order.

After this step

With utility data and dataset metadata in the database, you can proceed to insert actual observation records (samples, observation logs, measurements). Those steps follow the same translate-then-manage pattern using job files for the observation and measurement tables.

Updated: