Importing data into the AI4SH database follows the same Xspatula JSON-driven workflow used for database setup and process registration. What distinguishes data import is its two-step pattern: translate (convert tabular source data to JSON process files) then manage (insert those JSON process files into the database).

The import notebook

All import operations are driven by a single Jupyter notebook:

./AI4SH/import_ai4sh_data.ipynb

Open this notebook in VS Code or Jupyter Lab. It contains pairs of cells for each import step — one cell to translate source data to JSON, a second cell to insert (manage) that JSON into the database.

Scheme file

The notebook uses the same scheme file as the setup notebooks:

./AI4SH/scheme_ai4sh.json

This file defines your PostgreSQL connection, the project path, and execution flags. Edit it before running the notebook — at minimum set your database credentials. See the scheme file documentation for full details.

{
  "project_path": "./ai4sh",
  "postgresdb": {
    "host": "localhost",
    "port": 5432,
    "db": "ai4sh",
    "user_name": "your_postgres_user",
    "password": "your_password"
  },
  "process": [
    {
      "execute": true,
      "verbose": 1,
      "overwrite": false,
      "delete": false
    }
  ]
}

Import workflow overview

Data import runs in the following order:

  1. Translate utility data — convert catalogue Excel files to JSON process files
  2. Manage utility data — insert utility catalogues into the database
  3. Translate dataset metadata — convert dataset, campaign, and sampling log Excel files to JSON
  4. Manage dataset metadata — insert dataset metadata into the database

Each of these steps is a cell pair in the notebook and a page in this documentation.

Prerequisites

Before importing data you must have:

  • A running AI4SH database — see Setup AI4SH DB
  • All processes registered — see Setup AI4SH Processes
  • Source data (Excel files) in the correct directories under ./AI4SH/import_data/excel_src_data/

Two-step pattern

Every import operation follows the same pattern:

Excel source data
      ↓  [translate job]
JSON process files
      ↓  [manage job]
PostgreSQL database

The translate step reads your Excel files and writes JSON process files into ./AI4SH/import_data/manage_data/. The manage step reads those JSON files and executes them against the database. The two steps are decoupled so you can inspect the JSON output before committing to the database.

Updated: