Setup AI4SH Database
Seeding the AI4SH database uses the same Xspatula JSON-driven workflow described in the Xspatula python and DB model environment documentation. What this site adds is the full set of AI4SH-specific process files that define the schemas, tables and processes for a comprehensive soil data repository.
Prerequisites
Before running the AI4SH database setup you need:
- A running postgreSQL installation where you are superuser — see PostgreSQL setup
- The Xspatula framework installed and a working Anaconda environment — see Anaconda setup
- The
seed_ai4sh_dbrepository cloned to your machine:
git clone https://github.com/xspatula/seed_ai4sh_db
The setup notebook
The database setup is driven by the Jupyter notebook:
./setup/setup_db.ipynb
Open this notebook in VS Code or Jupyter Lab. The notebook has two key code blocks to edit before running:
- Scheme file — set the path to your scheme file
- Job file — set the name of the job file (default:
job_setup_db.json)
Edit the scheme file
The scheme file for setting up the AI4SH database is at:
./setup/zzz/scheme_ai4sh_local_setup.json
You must edit this file before running the notebook. At minimum change the postgreSQL superuser credentials and the database name:
{
"project_path": "./ai4sh",
"postgresdb": {
"host": "localhost",
"port": 5432,
"db": "ai4sh",
"user_name": "your_postgres_superuser",
"password": "your_postgres_superuser_password",
"db_users": [
{
"user_id": "community_admin",
"password": "guessing-rubble-garden-opera",
"role": "community_admin"
},
{
"user_id": "login_evaluation",
"password": "hippodrome-bicycle-concert-shuttle",
"role": "login_evaluation"
},
{
"user_id": "user_cat_0",
"password": "tablecloth-summerleaf-riverbasin-vacuumcleaner",
"role": "user_cat_0"
},
{
"user_id": "user_cat_1",
"password": "secret-parsimony-archipelago-hedgehog",
"role": "user_cat_1"
},
{
"user_id": "user_cat_2",
"password": "sailing-courageous-upsidedown-castle",
"role": "user_cat_2"
},
{
"user_id": "user_cat_3",
"password": "rollerscates-forever-skyline-coconut",
"role": "user_cat_3"
},
{
"user_id": "user_cat_4",
"password": "superfluid-altruistic-guitarplayer-climatechange",
"role": "user_cat_4"
},
{
"user_id": "user_cat_5",
"password": "fireplace-olympicgames-grassroot-luminescence",
"role": "user_cat_5"
}
]
},
"process": [
{
"execute": true,
"verbose": 1,
"overwrite": false,
"delete": false
}
]
}
You can alternatively use a .netrc file for credentials — replace user_name and password with "host_netrc_id": "your_netrc_machine_code". See .netrc setup in the core documentation.
Change the default passwords for all db_users before deploying to any non-local environment.
The pilot file
The job file points to the pilot file:
./setup/zzz/ai4sh/setup_db/db_xspatula_ai4sh_setup.txt
This text file lists all process JSON files in the order they must be executed. The order matters because of foreign key dependencies — schemas must exist before tables, and reference tables before tables that reference them.
The full execution order is:
schema/schema_v10_sql.json— create all 9 schemasutility/utility_v10_sql.json— utility tablesutility/utility_territory_v10_sql.json— territory reference datacommunity/— user categories, organisations and usersprocess/— process and process parameter tablesobservation_utility/— ~30 reference catalogue tables (independent first, then dependent)observation/— dataset, campaign, sample and observation tableslandscape/— landscape utility and observation tablesedna/— eDNA utility and observation tables
Each section is described in detail in the following pages.
Default community records
Before running the notebook, also edit the default organisation and user records:
./setup/zzz/ai4sh/setup_db/json_ai4sh/community/community_organisation_records_v10_sql.json
./setup/zzz/ai4sh/setup_db/json_ai4sh/community/community_user_records_v10_sql.json
These files insert at least one default organisation and user into the database. The inserted user name and password must match the user_project credentials in subsequent (non-setup) scheme files.
Run the notebook
With the scheme file edited and the notebook pointing to it, run all cells in setup_db.ipynb. The framework will connect to your postgreSQL cluster, create the database, and execute all process files in pilot file order.
To delete and rebuild the database, use:
./setup/delete_db.ipynb
with a scheme file where "delete": true.