Manage Observation Data
The manage observation step inserts the JSON process files created in the import observation step into the AI4SH database. This populates the dataset metadata tables: data sources, persons, datasets, campaigns, and sampling logs.
Notebook cell
In import_ai4sh_data.ipynb, the Manage dataset meta cell runs this step:
job_file = 'import_data/job_manage_dataset_meta.json'
structured_process_D, scheme_params_D = Initiate_process(notebook_path, scheme_file, job_file)
if structured_process_D is not None:
Run_process(structured_process_D, scheme_params_D)
Job file
File: ./AI4SH/import_data/job_manage_dataset_meta.json
{
"process": {
"job_folder": "import_data/manage_data/observation/dataset_meta",
"process_sub_folder": "process",
"pilot_file": "manage_dataset_meta.txt"
}
}
Pilot file
File: ./AI4SH/import_data/manage_data/observation/dataset_meta/manage_dataset_meta.txt
This file lists the absolute paths to the JSON process files from the translate step. Paste the output from the translate cell into this file before running the manage cell.
What gets inserted
Running the manage cell executes all listed JSON process files. Each inserts records into its target table:
| Process file | Target table | Description |
|---|---|---|
process_manage_data_source.json |
observation.data_source |
Organisations or individuals providing data |
process_manage_person.json |
observation.person |
Responsible persons for data collection |
process_manage_dataset.json |
observation.dataset |
Dataset-level metadata |
process_manage_campaign.json |
observation.campaign |
Campaign-level metadata |
process_manage_sampling_log.json |
observation.sampling_log |
Sampling log records |
Insertion order
The pilot file must list process files in dependency order. The framework executes them sequentially, and a process that references a foreign key in a table not yet populated will fail. The correct order mirrors the translate order:
process_manage_data_source.jsonprocess_manage_person.jsonprocess_manage_dataset.jsonprocess_manage_campaign.jsonprocess_manage_sampling_log.json
See Foreign key handling for more detail on managing dependency order.
After this step
With utility data and dataset metadata in the database, you can proceed to insert actual observation records (samples, observation logs, measurements). Those steps follow the same translate-then-manage pattern using job files for the observation and measurement tables.