Building the AI4SH DB with Xspatula

The EU funded project AI4SoilHealth (AI4SH) database is built for holding soil data derived both from field sampling and satellite images, with a large number of different instruments, laboratories, sensors and methods being used for capturing the data. To accommodate this data with FAIR (Findability, Accessibility, Interoperability, and Reuse) principles the Xspatula framework was used to build a comprehensive postgreSQL database.

Prerequisites

To seed the AI4SH database you must first have the Xspatula framework installed and a postgreSQL database running. These steps are described in the companion site Setup core db. This site assumes you have completed those steps.

You also need to clone or download the AI4SH seed package from GitHub:

git clone https://github.com/xspatula/seed_ai4sh_db

Outline of the AI4SH postgreSQL database

The AI4SH postgres database contains 9 schemas:

utility — support tables for general information used across schemas (default framework schema)
community — organisations and users; all users logging into the system must be registered here (default framework schema)
process — all processes defined for the AI4SH database (default framework schema)
landscape_utility — reference tables for landscape classification
landscape — landscape observations
observation_utility — catalogues and reference data required for FAIR-compliant soil observations (units, methods, instruments, taxa, etc.)
observation — actual soil property data, organised through datasets, campaigns, samples and observations
edna_utility — reference tables for environmental DNA methods
edna — eDNA observations

Seeding the database

The AI4SH database is seeded in two stages:

Setup DB — defines all schemas and tables using the Jupyter notebook setup/setup_db.ipynb
Setup processes — registers all framework processes in the database using the notebook setup/setup_processes.ipynb

Both stages use the Xspatula JSON-driven workflow: a scheme file points to a job file, which links to a pilot file listing the individual process files to execute. Alternatively, if you only have one process_file, you can point directly from the job_file to this process_file and skip a pilot_file. For a detailed explanation of this hierarchy, see the Xspatula framework documentation.

Schematic data entry flow

For entering soil property data the entire chain from dataset → campaign → sampling log → sample → observation must be complete. If any link in this chain is missing the data cannot be entered into the database. The observation_utility schema provides all the reference catalogues (units, methods, instruments, etc.) that observations depend on.

Acknowledgments and Funding

This work was done as part of the AI4SoilHealth project, funded by the European Union’s Horizon Europe Research and Innovation Programme under Grant Agreement No. 101086179.

Funded by the European Union. The views expressed are those of the authors and do not necessarily reflect those of the European Union or the European Research Executive Agency.

Thomas Gumbricht