Import AI4SH Data
Importing data into the AI4SH database follows the same Xspatula JSON-driven workflow used for database setup and process registration. What distinguishes data import is its two-step pattern: translate (convert tabular source data to JSON process files) then manage (insert those JSON process files into the database).
The import notebook
All import operations are driven by a single Jupyter notebook:
./AI4SH/import_ai4sh_data.ipynb
Open this notebook in VS Code or Jupyter Lab. It contains pairs of cells for each import step — one cell to translate source data to JSON, a second cell to insert (manage) that JSON into the database.
Scheme file
The notebook uses the same scheme file as the setup notebooks:
./AI4SH/scheme_ai4sh.json
This file defines your PostgreSQL connection, the project path, and execution flags. Edit it before running the notebook — at minimum set your database credentials. See the scheme file documentation for full details.
{
"project_path": "./ai4sh",
"postgresdb": {
"host": "localhost",
"port": 5432,
"db": "ai4sh",
"user_name": "your_postgres_user",
"password": "your_password"
},
"process": [
{
"execute": true,
"verbose": 1,
"overwrite": false,
"delete": false
}
]
}
Import workflow overview
Data import runs in the following order:
- Translate utility data — convert catalogue Excel files to JSON process files
- Manage utility data — insert utility catalogues into the database
- Translate dataset metadata — convert dataset, campaign, and sampling log Excel files to JSON
- Manage dataset metadata — insert dataset metadata into the database
Each of these steps is a cell pair in the notebook and a page in this documentation.
Prerequisites
Before importing data you must have:
- A running AI4SH database — see Setup AI4SH DB
- All processes registered — see Setup AI4SH Processes
- Source data (Excel files) in the correct directories under
./AI4SH/import_data/excel_src_data/
Two-step pattern
Every import operation follows the same pattern:
Excel source data
↓ [translate job]
JSON process files
↓ [manage job]
PostgreSQL database
The translate step reads your Excel files and writes JSON process files into ./AI4SH/import_data/manage_data/. The manage step reads those JSON files and executes them against the database. The two steps are decoupled so you can inspect the JSON output before committing to the database.