Import Observation Data

After utility catalogues are in the database, the next step translates dataset metadata: the data sources, persons, datasets, campaigns, and sampling logs that define the context of every observation. This step produces JSON process files that are then inserted by the manage observation step.

Notebook cell

In import_ai4sh_data.ipynb, the Translate dataset meta cell runs this step:

job_file = 'import_data/job_translate_dataset_meta.json'

structured_process_D, scheme_params_D = Initiate_process(notebook_path, scheme_file, job_file)

if structured_process_D is not None:
    Run_process(structured_process_D, scheme_params_D)

Job file

File: ./AI4SH/import_data/job_translate_dataset_meta.json

{
  "process": {
    "job_folder": "import_data/translate_data/observation/dataset_meta",
    "process_sub_folder": "process",
    "pilot_file": "translate_dataset_meta.txt"
  }
}

Pilot file

File: ./AI4SH/import_data/translate_data/observation/dataset_meta/translate_dataset_meta.txt

The pilot file lists translation processes in the required order:

data_source.json
person.json
dataset.json
campaign.json
sampling_log.json

Why this order matters

Each entity in the list depends on the previous ones:

Entity	Requires
`data_source`	nothing (independent)
`person`	nothing (independent)
`dataset`	`data_source`, `license` (utility)
`campaign`	`dataset`, `person`
`sampling_log`	`campaign`, `person`

Source files

All source files are in ./AI4SH/import_data/excel_src_data/dataset_campaign_sampling_log/:

Excel file	Translated to	Target table
`data_source.xlsx`	`process_manage_data_source.json`	`observation.data_source`
`person.xlsx`	`process_manage_person.json`	`observation.person`
`dataset.xlsx`	`process_manage_dataset.json`	`observation.dataset`
`campaign.xlsx`	`process_manage_campaign.json`	`observation.campaign`

Output location

JSON process files are written to:

./AI4SH/import_data/manage_data/observation/dataset_meta/process/

Copy the cell output (list of absolute file paths) into the manage pilot file before running the next step.

Next step

Proceed to Manage observation data to insert the translated dataset metadata into the database.