Setup AI4SH Database

Seeding the AI4SH database uses the same Xspatula JSON-driven workflow described in the Xspatula python and DB model environment documentation. What this site adds is the full set of AI4SH-specific process files that define the schemas, tables and processes for a comprehensive soil data repository.

Prerequisites

Before running the AI4SH database setup you need:

A running postgreSQL installation where you are superuser — see PostgreSQL setup
The Xspatula framework installed and a working Anaconda environment — see Anaconda setup
The seed_ai4sh_db repository cloned to your machine:

git clone https://github.com/xspatula/seed_ai4sh_db

The setup notebook

The database setup is driven by the Jupyter notebook:

./setup/setup_db.ipynb

Open this notebook in VS Code or Jupyter Lab. The notebook has two key code blocks to edit before running:

Scheme file — set the path to your scheme file
Job file — set the name of the job file (default: job_setup_db.json)

Edit the scheme file

The scheme file for setting up the AI4SH database is at:

./setup/zzz/scheme_ai4sh_local_setup.json

You must edit this file before running the notebook. At minimum change the postgreSQL superuser credentials and the database name:

{
  "project_path": "./ai4sh",
  "postgresdb": {
    "host": "localhost",
    "port": 5432,
    "db": "ai4sh",
    "user_name": "your_postgres_superuser",
    "password": "your_postgres_superuser_password",
    "db_users": [
      {
        "user_id": "community_admin",
        "password": "guessing-rubble-garden-opera",
        "role": "community_admin"
      },
      {
        "user_id": "login_evaluation",
        "password": "hippodrome-bicycle-concert-shuttle",
        "role": "login_evaluation"
      },
      {
        "user_id": "user_cat_0",
        "password": "tablecloth-summerleaf-riverbasin-vacuumcleaner",
        "role": "user_cat_0"
      },
      {
        "user_id": "user_cat_1",
        "password": "secret-parsimony-archipelago-hedgehog",
        "role": "user_cat_1"
      },
      {
        "user_id": "user_cat_2",
        "password": "sailing-courageous-upsidedown-castle",
        "role": "user_cat_2"
      },
      {
        "user_id": "user_cat_3",
        "password": "rollerscates-forever-skyline-coconut",
        "role": "user_cat_3"
      },
      {
        "user_id": "user_cat_4",
        "password": "superfluid-altruistic-guitarplayer-climatechange",
        "role": "user_cat_4"
      },
      {
        "user_id": "user_cat_5",
        "password": "fireplace-olympicgames-grassroot-luminescence",
        "role": "user_cat_5"
      }
    ]
  },
  "process": [
    {
      "execute": true,
      "verbose": 1,
      "overwrite": false,
      "delete": false
    }
  ]
}

You can alternatively use a .netrc file for credentials — replace user_name and password with "host_netrc_id": "your_netrc_machine_code". See .netrc setup in the core documentation.

Change the default passwords for all db_users before deploying to any non-local environment.

The pilot file

The job file points to the pilot file:

./setup/zzz/ai4sh/setup_db/db_xspatula_ai4sh_setup.txt

This text file lists all process JSON files in the order they must be executed. The order matters because of foreign key dependencies — schemas must exist before tables, and reference tables before tables that reference them.

The full execution order is:

schema/schema_v10_sql.json — create all 9 schemas
utility/utility_v10_sql.json — utility tables
utility/utility_territory_v10_sql.json — territory reference data
community/ — user categories, organisations and users
process/ — process and process parameter tables
observation_utility/ — ~30 reference catalogue tables (independent first, then dependent)
observation/ — dataset, campaign, sample and observation tables
landscape/ — landscape utility and observation tables
edna/ — eDNA utility and observation tables

Each section is described in detail in the following pages.

Default community records

Before running the notebook, also edit the default organisation and user records:

./setup/zzz/ai4sh/setup_db/json_ai4sh/community/community_organisation_records_v10_sql.json
./setup/zzz/ai4sh/setup_db/json_ai4sh/community/community_user_records_v10_sql.json

These files insert at least one default organisation and user into the database. The inserted user name and password must match the user_project credentials in subsequent (non-setup) scheme files.

Run the notebook

With the scheme file edited and the notebook pointing to it, run all cells in setup_db.ipynb. The framework will connect to your postgreSQL cluster, create the database, and execute all process files in pilot file order.

To delete and rebuild the database, use:

./setup/delete_db.ipynb

with a scheme file where "delete": true.