Import Observation Data
After utility catalogues are in the database, the next step translates dataset metadata: the data sources, persons, datasets, campaigns, and sampling logs that define the context of every observation. This step produces JSON process files that are then inserted by the manage observation step.
Notebook cell
In import_ai4sh_data.ipynb, the Translate dataset meta cell runs this step:
job_file = 'import_data/job_translate_dataset_meta.json'
structured_process_D, scheme_params_D = Initiate_process(notebook_path, scheme_file, job_file)
if structured_process_D is not None:
Run_process(structured_process_D, scheme_params_D)
Job file
File: ./AI4SH/import_data/job_translate_dataset_meta.json
{
"process": {
"job_folder": "import_data/translate_data/observation/dataset_meta",
"process_sub_folder": "process",
"pilot_file": "translate_dataset_meta.txt"
}
}
Pilot file
File: ./AI4SH/import_data/translate_data/observation/dataset_meta/translate_dataset_meta.txt
The pilot file lists translation processes in the required order:
data_source.json
person.json
dataset.json
campaign.json
sampling_log.json
Why this order matters
Each entity in the list depends on the previous ones:
| Entity | Requires |
|---|---|
data_source |
nothing (independent) |
person |
nothing (independent) |
dataset |
data_source, license (utility) |
campaign |
dataset, person |
sampling_log |
campaign, person |
Source files
All source files are in ./AI4SH/import_data/excel_src_data/dataset_campaign_sampling_log/:
| Excel file | Translated to | Target table |
|---|---|---|
data_source.xlsx |
process_manage_data_source.json |
observation.data_source |
person.xlsx |
process_manage_person.json |
observation.person |
dataset.xlsx |
process_manage_dataset.json |
observation.dataset |
campaign.xlsx |
process_manage_campaign.json |
observation.campaign |
Output location
JSON process files are written to:
./AI4SH/import_data/manage_data/observation/dataset_meta/process/
Copy the cell output (list of absolute file paths) into the manage pilot file before running the next step.
Next step
Proceed to Manage observation data to insert the translated dataset metadata into the database.