Import AI4SH Data

Importing data into the AI4SH database follows the same Xspatula JSON-driven workflow used for database setup and process registration. What distinguishes data import is its two-step pattern: translate (convert tabular source data to JSON process files) then manage (insert those JSON process files into the database).

The import notebook

All import operations are driven by a single Jupyter notebook:

./AI4SH/import_ai4sh_data.ipynb

Open this notebook in VS Code or Jupyter Lab. It contains pairs of cells for each import step — one cell to translate source data to JSON, a second cell to insert (manage) that JSON into the database.

Scheme file

The notebook uses the same scheme file as the setup notebooks:

./AI4SH/scheme_ai4sh.json

This file defines your PostgreSQL connection, the project path, and execution flags. Edit it before running the notebook — at minimum set your database credentials. See the scheme file documentation for full details.

{
  "project_path": "./ai4sh",
  "postgresdb": {
    "host": "localhost",
    "port": 5432,
    "db": "ai4sh",
    "user_name": "your_postgres_user",
    "password": "your_password"
  },
  "process": [
    {
      "execute": true,
      "verbose": 1,
      "overwrite": false,
      "delete": false
    }
  ]
}

Import workflow overview

Data import runs in the following order:

Translate utility data — convert catalogue Excel files to JSON process files
Manage utility data — insert utility catalogues into the database
Translate dataset metadata — convert dataset, campaign, and sampling log Excel files to JSON
Manage dataset metadata — insert dataset metadata into the database

Each of these steps is a cell pair in the notebook and a page in this documentation.

Prerequisites

Before importing data you must have:

A running AI4SH database — see Setup AI4SH DB
All processes registered — see Setup AI4SH Processes
Source data (Excel files) in the correct directories under ./AI4SH/import_data/excel_src_data/

Two-step pattern

Every import operation follows the same pattern:

Excel source data
      ↓  [translate job]
JSON process files
      ↓  [manage job]
PostgreSQL database

The translate step reads your Excel files and writes JSON process files into ./AI4SH/import_data/manage_data/. The manage step reads those JSON files and executes them against the database. The two steps are decoupled so you can inspect the JSON output before committing to the database.