Skip to content

feat: final aggregation of global code #266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 91 commits into from
Jun 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
f0ac653
fix: update global soilid code
jjmaynard Dec 20, 2024
64cf160
fix: update global soilid code
jjmaynard Dec 20, 2024
93af38f
fix: postgres database integration
jjmaynard Jan 7, 2025
f5a11fb
Merge branch 'fix/global-code-update' of https://github.com/techmatte…
jjmaynard Jan 7, 2025
9ff26ea
fix: created .env file and updated .gitignore
jjmaynard Jan 10, 2025
f8bfc57
fix: revision of global functions
jjmaynard Feb 4, 2025
37561eb
fix: remove test files
jjmaynard Feb 4, 2025
b273b3d
fix: lint format changes
jjmaynard Feb 4, 2025
51cf8c8
fix: make format changes
jjmaynard Feb 4, 2025
ddab942
style: use LF line endings
paulschreiber Feb 4, 2025
d9ba6ab
Normalize line endings to LF
jjmaynard Feb 6, 2025
0b6c5c9
Update soil_id/db.py
jjmaynard Feb 12, 2025
418ea47
fix: new psql color tables
jjmaynard Feb 18, 2025
082568b
Merge branch 'fix/global-code-update' of https://github.com/techmatte…
jjmaynard Feb 18, 2025
1299c7e
fix: update psql functions
jjmaynard Feb 27, 2025
959add1
fix: HWSDv2 postgres integration
jjmaynard Mar 4, 2025
605b369
fix: global color calculation
jjmaynard Mar 4, 2025
587cc7e
feat: update tests for global algorithm
garobrik Mar 11, 2025
2fc46f0
fix: global bugs
jjmaynard Mar 21, 2025
c24137a
test: update tests
garobrik Mar 25, 2025
40261a0
fix: traceback errors-geo
jjmaynard Mar 28, 2025
3e70f5b
fix: no map data return
jjmaynard Mar 28, 2025
ac0a581
fix: make black/lint format
jjmaynard Mar 31, 2025
e86c5df
fix: sql code update
jjmaynard Apr 10, 2025
d26ba68
fix: update global soilid code
jjmaynard Dec 20, 2024
e7cd519
fix: postgres database integration
jjmaynard Jan 7, 2025
96f6c46
fix: update global soilid code
jjmaynard Dec 20, 2024
a429d2e
fix: created .env file and updated .gitignore
jjmaynard Jan 10, 2025
b03b133
fix: revision of global functions
jjmaynard Feb 4, 2025
1f91f2d
fix: remove test files
jjmaynard Feb 4, 2025
2ef351b
fix: lint format changes
jjmaynard Feb 4, 2025
d98bbb7
fix: make format changes
jjmaynard Feb 4, 2025
00a63f4
style: use LF line endings
paulschreiber Feb 4, 2025
05fe937
Normalize line endings to LF
jjmaynard Feb 6, 2025
c671b67
fix: new psql color tables
jjmaynard Feb 18, 2025
ea5f979
Update soil_id/db.py
jjmaynard Feb 12, 2025
8409988
fix: update psql functions
jjmaynard Feb 27, 2025
60d8d20
fix: HWSDv2 postgres integration
jjmaynard Mar 4, 2025
4f0d1c9
fix: global color calculation
jjmaynard Mar 4, 2025
26f1e84
feat: update tests for global algorithm
garobrik Mar 11, 2025
f9f5cda
fix: global bugs
jjmaynard Mar 21, 2025
65152ab
test: update tests
garobrik Mar 25, 2025
4e02a14
fix: traceback errors-geo
jjmaynard Mar 28, 2025
8d6e0b1
fix: no map data return
jjmaynard Mar 28, 2025
6a5ef1e
fix: make black/lint format
jjmaynard Mar 31, 2025
0951ee4
fix: sql code update
jjmaynard Apr 10, 2025
d54c625
build: update Makefile README.md pyproject.toml requirements-dev.txt …
paulschreiber Apr 14, 2025
ab9d207
style: run make format
paulschreiber Apr 14, 2025
caebd71
fix: fix imports in config
paulschreiber Apr 14, 2025
22eccb5
fix: remove unsued variables
paulschreiber Apr 14, 2025
6a947a2
fix: global test dataset and testing
jjmaynard Apr 22, 2025
e16183e
Merge branch 'fix/global-code-update' of https://github.com/techmatte…
jjmaynard Apr 22, 2025
d657f9a
test: update tests
garobrik May 6, 2025
e727737
dx: add docker file for dev database
garobrik May 6, 2025
a4fef55
fix: lint
garobrik May 6, 2025
9cc216a
test: skip global test by default (no DB in CI)
garobrik May 6, 2025
f7c48f4
fix: pick a postgres version that matches our backend
garobrik May 14, 2025
a04699f
fix: add-elevation-api fix-location-score-calc
jjmaynard Jun 9, 2025
f27b6e6
fix: normalization in gower distance function
jjmaynard Jun 9, 2025
2fe2330
fix: Enhance gower_distances normalization
jjmaynard Jun 9, 2025
fce08ad
fix: add LAB color ranges, revised location score calc logic
jjmaynard Jun 9, 2025
95e896a
fix: gower distance calc, location-score calc
jjmaynard Jun 9, 2025
d50e61e
refactor depth logic to include top depth (#251)
jjmaynard Jun 9, 2025
2db57c7
fix: add top depth to json output (#251)
jjmaynard Jun 9, 2025
f2b189a
fix: lint-format
jjmaynard Jun 9, 2025
222f37a
fix: developed robust data infilling for soil simution (#215)
jjmaynard Jun 10, 2025
c0276ab
fix: add dataframe sorting to fix data rank assignment
jjmaynard Jun 10, 2025
6d4a70f
Merge branch 'main' into fix/global-code-update
garobrik Jun 12, 2025
a59c9ba
fix: update test_us.py depth logic to handle custom depths
jjmaynard Jun 16, 2025
c25e20d
Merge remote-tracking branch 'origin/fix/soil-simulation-infilling' i…
garobrik Jun 18, 2025
f04d372
Merge remote-tracking branch 'origin/fix/correct-data-rank-assignment…
garobrik Jun 18, 2025
9ddaf91
Merge remote-tracking branch 'origin/fix/revised-depth-logic' into tm…
garobrik Jun 18, 2025
734d40b
Merge branch 'tmp/fixes' into tmp/fixes-with-global
garobrik Jun 18, 2025
a2336e8
fix: revert bad change to US algo
garobrik Jun 19, 2025
64f0973
test: re-disable test we don't pass, parametrize test
garobrik Jun 19, 2025
cfb3531
feat: pass db connection for global as param
garobrik Apr 10, 2025
65580a8
feat: fixup region check code
garobrik May 13, 2025
9803314
feat: parametrize buffer distance
garobrik May 14, 2025
dc85ffa
perf: use subdivided query
garobrik May 14, 2025
3030fd3
test: log list results in bulk tests
garobrik May 14, 2025
c0a5fce
perf: optimize drop_cokey_horz
garobrik Jun 3, 2025
0b713c3
perf: optimize NaN infill step
garobrik Jun 3, 2025
a0b02a7
perf: optimize color similarity step
garobrik Jun 3, 2025
cbffec1
test: fix error in test code
garobrik Jun 4, 2025
816c243
run CI on all branches
garobrik Jun 5, 2025
d066a60
parametrize global test
garobrik Jun 5, 2025
66e49b6
add Makefile commands for creating Docker image
garobrik Jun 12, 2025
076987d
feat: run global test in CI
garobrik Jun 18, 2025
ec8a001
fix: US bulk tests
garobrik Jun 19, 2025
f4e0f68
fix: US bulk test
garobrik Jun 25, 2025
28e33ae
feat: improve Dockerfile and Makefile
garobrik Jun 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ on:
branches:
- main
pull_request:
branches:
- main
types:
- opened
- edited
Expand Down Expand Up @@ -83,5 +81,13 @@ jobs:
if: ${{ hashFiles('Data/*') == '' }}
run: make download-soil-data

- name: Start soil id DB
run: docker compose up -d

- name: Run tests
env:
DB_NAME: soil_id
DB_HOST: localhost
DB_USERNAME: postgres
DB_PASSWORD: postgres
run: make test
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,6 @@ prof/

# reference and output files
Data/
bulk_test_result*

get-pip.py*
51 changes: 51 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Stage 1: Build database with preloaded data
FROM postgis/postgis:16-3.5 as builder

ARG POSTGRES_DB=soil_id
ARG POSTGRES_USER=postgres
ARG POSTGRES_PASSWORD=postgres
ARG DATABASE_DUMP_FILE=Data/soil_id_db.dump

ENV POSTGRES_DB=${POSTGRES_DB}
ENV POSTGRES_USER=${POSTGRES_USER}
ENV POSTGRES_PASSWORD=${POSTGRES_PASSWORD}

# Copy dump into builder stage
COPY ${DATABASE_DUMP_FILE} /tmp/soil_id_db.dump

# Initialize database and preload it
RUN mkdir -p /var/lib/postgresql/data \
&& chown -R ${POSTGRES_USER}:${POSTGRES_USER} /var/lib/postgresql

USER ${POSTGRES_USER}

# Initialize database system
RUN /usr/lib/postgresql/16/bin/initdb -D /var/lib/postgresql/data

# Start database in background, preload dump, shut it down
RUN pg_ctl -D /var/lib/postgresql/data -o "-c listen_addresses=''" -w start \
&& createdb -U ${POSTGRES_USER} ${POSTGRES_DB} \
&& psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "CREATE EXTENSION postgis;" \
&& pg_restore -U ${POSTGRES_USER} -d ${POSTGRES_DB} /tmp/soil_id_db.dump \
&& psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "CLUSTER hwsd2_segment USING hwsd2_segment_shape_idx;" \
&& psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "ANALYZE;" \
&& pg_ctl -D /var/lib/postgresql/data -m fast -w stop

# Stage 2: Final image with loaded data
FROM postgis/postgis:16-3.5

COPY --from=builder /var/lib/postgresql/data /var/lib/postgresql/data

# Create a Docker-friendly pg_hba.conf
USER root
RUN echo "local all all trust" > /var/lib/postgresql/data/pg_hba.conf \
&& echo "host all all 127.0.0.1/32 trust" >> /var/lib/postgresql/data/pg_hba.conf \
&& echo "host all all ::1/128 trust" >> /var/lib/postgresql/data/pg_hba.conf \
&& echo "host all all 0.0.0.0/0 trust" >> /var/lib/postgresql/data/pg_hba.conf \
&& chown ${POSTGRES_USER}:${POSTGRES_USER} /var/lib/postgresql/data/pg_hba.conf

ENV POSTGRES_DB=${POSTGRES_DB}
ENV POSTGRES_USER={POSTGRES_USER}
ENV POSTGRES_PASSWORD=${POSTGRES_PASSWORD}

USER ${POSTGRES_USER}
57 changes: 54 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -54,13 +54,33 @@ graphs:
gprof2dot -f pstats prof/test_soil_location.prof | dot -Tsvg -o prof/test_soil_location.svg
flameprof prof/test_soil_location.prof > prof/test_soil_location_flame.svg

generate_bulk_test_results:
generate_bulk_test_results_us:
python -m soil_id.tests.us.generate_bulk_test_results

process_bulk_test_results:
process_bulk_test_results_us:
python -m soil_id.tests.us.process_bulk_test_results $(RESULTS_FILE)

generate_bulk_test_results_global:
python -m soil_id.tests.global.generate_bulk_test_results

process_bulk_test_results_global:
python -m soil_id.tests.global.process_bulk_test_results $(RESULTS_FILE)

generate_bulk_test_results_legacy:
python -m soil_id.tests.legacy.generate_bulk_test_results

process_bulk_test_results_legacy:
python -m soil_id.tests.legacy.process_bulk_test_results $(RESULTS_FILE)

# Donwload Munsell CSV, SHX, SHP, SBX, SBN, PRJ, DBF
# 1tN23iVe6X1fcomcfveVp4w3Pwd0HJuTe: LandPKS_munsell_rgb_lab.csv
# 1WUa9e3vTWPi6G8h4OI3CBUZP5y7tf1Li: gsmsoilmu_a_us.shx
# 1l9MxC0xENGmI_NmGlBY74EtlD6SZid_a: gsmsoilmu_a_us.shp
# 1asGnnqe0zI2v8xuOszlsNmZkOSl7cJ2n: gsmsoilmu_a_us.sbx
# 185Qjb9pJJn4AzOissiTz283tINrDqgI0: gsmsoilmu_a_us.sbn
# 1P3xl1YRlfcMjfO_4PM39tkrrlL3hoLzv: gsmsoilmu_a_us.prj
# 1K0GkqxhZiVUND6yfFmaI7tYanLktekyp: gsmsoilmu_a_us.dbf
# 1z7foFFHv_mTsuxMYnfOQRvXT5LKYlYFN: SoilID_US_Areas.shz
download-soil-data:
mkdir -p Data
cd Data; \
Expand All @@ -70,4 +90,35 @@ download-soil-data:
gdown 1asGnnqe0zI2v8xuOszlsNmZkOSl7cJ2n; \
gdown 185Qjb9pJJn4AzOissiTz283tINrDqgI0; \
gdown 1P3xl1YRlfcMjfO_4PM39tkrrlL3hoLzv; \
gdown 1K0GkqxhZiVUND6yfFmaI7tYanLktekyp \
gdown 1K0GkqxhZiVUND6yfFmaI7tYanLktekyp; \
gdown 1z7foFFHv_mTsuxMYnfOQRvXT5LKYlYFN \

DATABASE_DUMP_FILE ?= Data/soil_id_db.dump
DOCKER_IMAGE_TAG ?= ghcr.io/techmatters/soil-id-db:latest
build_docker_image:
@echo "Building to tag $(DOCKER_IMAGE_TAG)"
docker build \
--build-arg DATABASE_DUMP_FILE=$(DATABASE_DUMP_FILE) \
-t $(DOCKER_IMAGE_TAG) \
.

push_docker_image:
@echo "Pushing tag $(DOCKER_IMAGE_TAG). Make sure to provide a versioned tag in addition to updating latest!"
docker push $(DOCKER_IMAGE_TAG)

start_db:
docker compose up -d

stop_db:
docker compose down

connect_db:
docker compose exec db psql -U postgres -d soil_id

dump_soil_id_db:
pg_dump --format=custom $(DATABASE_URL) -t hwsd2_segment -t hwsd2_data -t landpks_munsell_rgb_lab -t normdist1 -t normdist2 -t wise_soil_data -t wrb2006_to_fao90 -t wrb_fao90_desc -f $(DATABASE_DUMP_FILE)

restore_soil_id_db:
pg_restore --dbname=$(DATABASE_URL) --single-transaction --clean --if-exists --no-owner $(DATABASE_DUMP_FILE)
psql $(DATABASE_URL) -c "CLUSTER hwsd2_segment USING hwsd2_segment_shape_idx;"
psql $(DATABASE_URL) -c "ANALYZE;"
11 changes: 11 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: soil_id

services:
db:
image: ghcr.io/techmatters/soil-id-db:0.2
ports:
- "5432:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: soil_id # default database
14 changes: 2 additions & 12 deletions soil_id/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,42 +12,32 @@
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see https://www.gnu.org/licenses/.

import os
import tempfile

from dotenv import load_dotenv
from platformdirs import user_cache_dir

load_dotenv()

DATA_PATH = os.environ.get("DATA_PATH", "Data")

# Numpy seeding
RANDOM_SEED = os.environ.get("RANDOM_SEED", 19)

# Output
APP_NAME = os.environ.get("APP_NAME", "org.terraso.soilid")
TEMP_DIR = tempfile.TemporaryDirectory(delete=False)
TEMP_DIR = tempfile.TemporaryDirectory()
CACHE_DIR = user_cache_dir(APP_NAME)
OUTPUT_PATH = TEMP_DIR.name
SOIL_ID_RANK_PATH = f"{OUTPUT_PATH}/soil_id_rank.csv"
SOIL_ID_PROB_PATH = f"{OUTPUT_PATH}/soil_id_cond_prob.csv"
REQUESTS_CACHE_PATH = f"{CACHE_DIR}/requests_cache"

# Determines if in/out of US
US_AREA_PATH = f"{DATA_PATH}/SoilID_US_Areas.shp"
US_AREA_PATH = f"{DATA_PATH}/SoilID_US_Areas.shz"

# US Soil ID
STATSGO_PATH = f"{DATA_PATH}/gsmsoilmu_a_us.shp"
MUNSELL_RGB_LAB_PATH = f"{DATA_PATH}/LandPKS_munsell_rgb_lab.csv"

# Global Soil ID
HWSD_PATH = f"{DATA_PATH}/HWSD_global_noWater_no_country.shp"
WISE_PATH = f"{DATA_PATH}/wise30sec_poly_simp_soil.shp"
NORM_DIST_1_PATH = f"{DATA_PATH}/NormDist1.csv"
NORM_DIST_2_PATH = f"{DATA_PATH}/NormDist2.csv"

# Database
DB_NAME = os.environ.get("DB_NAME", "terraso_backend")
DB_HOST = os.environ.get("DB_HOST")
Expand Down
Loading