After utility catalogues are in the database, the next step translates dataset metadata: the data sources, persons, datasets, campaigns, and sampling logs that define the context of every observation. This step produces JSON process files that are then inserted by the manage observation step.

Notebook cell

In import_ai4sh_data.ipynb, the Translate dataset meta cell runs this step:

job_file = 'import_data/job_translate_dataset_meta.json'

structured_process_D, scheme_params_D = Initiate_process(notebook_path, scheme_file, job_file)

if structured_process_D is not None:
    Run_process(structured_process_D, scheme_params_D)

Job file

File: ./AI4SH/import_data/job_translate_dataset_meta.json

{
  "process": {
    "job_folder": "import_data/translate_data/observation/dataset_meta",
    "process_sub_folder": "process",
    "pilot_file": "translate_dataset_meta.txt"
  }
}

Pilot file

File: ./AI4SH/import_data/translate_data/observation/dataset_meta/translate_dataset_meta.txt

The pilot file lists translation processes in the required order:

data_source.json
person.json
dataset.json
campaign.json
sampling_log.json

Why this order matters

Each entity in the list depends on the previous ones:

Entity Requires
data_source nothing (independent)
person nothing (independent)
dataset data_source, license (utility)
campaign dataset, person
sampling_log campaign, person

Source files

All source files are in ./AI4SH/import_data/excel_src_data/dataset_campaign_sampling_log/:

Excel file Translated to Target table
data_source.xlsx process_manage_data_source.json observation.data_source
person.xlsx process_manage_person.json observation.person
dataset.xlsx process_manage_dataset.json observation.dataset
campaign.xlsx process_manage_campaign.json observation.campaign

Output location

JSON process files are written to:

./AI4SH/import_data/manage_data/observation/dataset_meta/process/

Copy the cell output (list of absolute file paths) into the manage pilot file before running the next step.

Next step

Proceed to Manage observation data to insert the translated dataset metadata into the database.

Updated: