Observation Schema
The observation schema stores actual soil property data. Data is organised in a strict hierarchy: an observation belongs to a sample, a sample belongs to a sampling log, a sampling log belongs to a campaign, a campaign belongs to a dataset. Every step in this chain must exist before the next can be created. All observation values link to reference records in observation_utility.
Process files
| File | Tables created | Description |
|---|---|---|
observation/observation_data_source_v10_sql.json |
data_source |
Simplified copy of community.organisation — passive data suppliers who need not be full system users |
observation/observation_person_v10_sql.json |
person |
Simplified copy of community.user with no login credentials — allows attributing data to non-registered persons (GDPR consideration) |
observation/observation_dataset_v10_sql.json |
dataset |
Top-level grouping of related data (e.g. LUCAS, AI4SoilHealth) |
observation/observation_campaign_v10_sql.json |
campaign, campaign_meta, campaign_method_tier, campaign_location, campaign_provision, campaign_setting |
Child of dataset; captures details of a specific data collection effort |
observation/observation_sampling_log_v10_sql.json |
sampling_log |
A sampling event — one or multiple samples taken with consistent methods over a period |
observation/observation_sample_geolocation_v10_sql.json |
sample_geolocation |
Spatial coordinates for a geolocated sample |
observation/observation_sample_v10_sql.json |
sample |
An individual sample acquired under a sampling log |
observation/observation_observation_log_v10_sql.json |
observation_log |
Links a sampling log to a provision; records the logistics (preservation, storage, transportation) of getting the sample to analysis |
observation/observation_observation_v10_sql.json |
observation |
The actual measured value for a specific indicator from a specific provision |
observation/observation_infiltration_beerkan_v10_sql.json |
infiltration_beerkan |
BeerKan infiltration measurements |
observation/observation_macrofauna_image_v10_sql.json |
macrofauna_image |
Image records for macrofauna observations |
observation/observation_macrofauna_v10_sql.json |
macrofauna |
Macrofauna count and biomass observations |
observation/observation_measurement_v10_sql.json |
measurement |
General sensor or instrument measurements |
Table hierarchy
data_source
dataset (→ data_source)
campaign (→ dataset)
campaign_meta (→ campaign, observation_utility.*)
campaign_method_tier (→ campaign)
campaign_location (→ campaign, utility.territory, observation_utility.spatial_reference)
campaign_provision (→ campaign, observation_utility.provision)
campaign_setting (→ campaign, observation_utility.setting_system)
sampling_log (→ campaign)
sample_geolocation (→ sampling_log, observation_utility.spatial_reference)
sample (→ sampling_log)
observation_log (→ sampling_log, observation_utility.provision)
observation (→ observation_log, observation_utility.provision_indicator)
Key tables in detail
dataset
A dataset is a coherent collection of data from one or more campaigns — for example, LUCAS 2009–2022 or the AI4SoilHealth project. Datasets are associated with a data_source.
Key columns: id, data_source_id, name, alias, status_code.
campaign
A campaign is a child of a dataset where sampling methods may vary in detail. For example, LUCAS 2009 and LUCAS 2015 are two campaigns within the LUCAS dataset. Each AI4SH pilot site is a separate campaign.
The campaign table has five companion meta tables:
- campaign_meta — abstract, DOI, URL, keywords, time series flag, license, taxonomic scope
- campaign_method_tier — boolean flags for which method tiers (in-situ, in-home, in-lab, from-drone, from-satellite, etc.) are used in the campaign
- campaign_location — geographic bounding box and spatial reference
- campaign_provision — which provisions (instrument+provider+method_tier combinations) are used
- campaign_setting — which setting systems (field, forest, lake, etc.) apply
sampling_log
A sampling log is a sampling event within a campaign, covering one or multiple samples taken with consistent methods. It records the responsible person and time window.
sample and sample_geolocation
sample represents an individual physical sample. sample_geolocation optionally records its latitude, longitude, elevation, and spatial reference. Decoupled from sample so that non-geolocated samples are still representable.
observation_log and observation
observation_log bridges a sampling log to a provision (instrument + service + method tier) and records the sample handling chain (preservation, storage, transportation method). observation stores the actual numeric or text result for a specific provision_indicator (a measurable value from a specific provision).
Specialised observation tables
Three additional tables cover non-standard observation types:
- infiltration_beerkan — BeerKan ring infiltration test results
- macrofauna and macrofauna_image — macrofauna counts, biomass, and associated image records
- measurement — direct sensor readings (e.g. from a handheld spectral sensor)