Skip to content

Commit f2ac010

Browse files
garobrikjjmaynardpaulschreiber
authored
feat: final aggregation of global code, known good state with US fixes (#266)
* fix: update global soilid code * fix: update global soilid code * fix: postgres database integration -updated code to query postgres database - fixed random bugs * fix: created .env file and updated .gitignore * fix: revision of global functions * fix: remove test files * fix: lint format changes * fix: make format changes * style: use LF line endings * Normalize line endings to LF * Update soil_id/db.py Co-authored-by: Paul Schreiber <[email protected]> * fix: new psql color tables * fix: update psql functions * fix: HWSDv2 postgres integration * fix: global color calculation * feat: update tests for global algorithm * fix: global bugs fixed the 3 traceback issues @shrouxm identified. code now runs under 2s. SoilGrids API returns are sometimes taking 20-30 s. * test: update tests * fix: traceback errors-geo -fixed distance calculation errors associated with coordinate system transformation. -added top depth input to rank function * fix: no map data return When no map data is available at a location - returns "Data_unavailable" * fix: make black/lint format * fix: sql code update -modified sql code to query all home mapunit components and all adjacent map units and components. -fixed component name indexing. * fix: update global soilid code * fix: postgres database integration -updated code to query postgres database - fixed random bugs * fix: update global soilid code * fix: created .env file and updated .gitignore * fix: revision of global functions * fix: remove test files * fix: lint format changes * fix: make format changes * style: use LF line endings * Normalize line endings to LF * fix: new psql color tables * Update soil_id/db.py Co-authored-by: Paul Schreiber <[email protected]> * fix: update psql functions * fix: HWSDv2 postgres integration * fix: global color calculation * feat: update tests for global algorithm * fix: global bugs fixed the 3 traceback issues @shrouxm identified. code now runs under 2s. SoilGrids API returns are sometimes taking 20-30 s. * test: update tests * fix: traceback errors-geo -fixed distance calculation errors associated with coordinate system transformation. -added top depth input to rank function * fix: no map data return When no map data is available at a location - returns "Data_unavailable" * fix: make black/lint format * fix: sql code update -modified sql code to query all home mapunit components and all adjacent map units and components. -fixed component name indexing. * build: update Makefile README.md pyproject.toml requirements-dev.txt requirements.txt requirements/dev.in setup.cfg from main * style: run make format * fix: fix imports in config * fix: remove unsued variables * fix: global test dataset and testing format global test dataset gitignore global testing * test: update tests * dx: add docker file for dev database * fix: lint * test: skip global test by default (no DB in CI) * fix: pick a postgres version that matches our backend * fix: add-elevation-api fix-location-score-calc * fix: normalization in gower distance function * fix: Enhance gower_distances normalization * fix: add LAB color ranges, revised location score calc logic * fix: gower distance calc, location-score calc * refactor depth logic to include top depth (#251) * fix: add top depth to json output (#251) * fix: lint-format * fix: developed robust data infilling for soil simution (#215) * fix: add dataframe sorting to fix data rank assignment - dataframe sorting occurred on grouped dataframe but was missing on ungrouped dataframe resulting in incorrect Rank_Data number assignment. * fix: update test_us.py depth logic to handle custom depths -replaced horizonDepth with topDepth an bottomDepth. * fix: revert bad change to US algo * test: re-disable test we don't pass, parametrize test * feat: pass db connection for global as param otherwise API clients need to set special environment variables for this library * feat: fixup region check code * feat: parametrize buffer distance * perf: use subdivided query * test: log list results in bulk tests * perf: optimize drop_cokey_horz * perf: optimize NaN infill step * perf: optimize color similarity step * test: fix error in test code * run CI on all branches * parametrize global test * add Makefile commands for creating Docker image * feat: run global test in CI * fix: US bulk tests * fix: US bulk test * feat: improve Dockerfile and Makefile --------- Co-authored-by: jjmaynard <[email protected]> Co-authored-by: Paul Schreiber <[email protected]> Co-authored-by: Jonathan Maynard <[email protected]> Co-authored-by: Jonathan Maynard <[email protected]> Co-authored-by: Paul Schreiber <[email protected]>
1 parent 45726a9 commit f2ac010

23 files changed

+35126
-23945
lines changed

.github/workflows/build.yml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@ on:
66
branches:
77
- main
88
pull_request:
9-
branches:
10-
- main
119
types:
1210
- opened
1311
- edited
@@ -83,5 +81,13 @@ jobs:
8381
if: ${{ hashFiles('Data/*') == '' }}
8482
run: make download-soil-data
8583

84+
- name: Start soil id DB
85+
run: docker compose up -d
86+
8687
- name: Run tests
88+
env:
89+
DB_NAME: soil_id
90+
DB_HOST: localhost
91+
DB_USERNAME: postgres
92+
DB_PASSWORD: postgres
8793
run: make test

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,5 +27,6 @@ prof/
2727

2828
# reference and output files
2929
Data/
30+
bulk_test_result*
3031

3132
get-pip.py*

Dockerfile

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Stage 1: Build database with preloaded data
2+
FROM postgis/postgis:16-3.5 as builder
3+
4+
ARG POSTGRES_DB=soil_id
5+
ARG POSTGRES_USER=postgres
6+
ARG POSTGRES_PASSWORD=postgres
7+
ARG DATABASE_DUMP_FILE=Data/soil_id_db.dump
8+
9+
ENV POSTGRES_DB=${POSTGRES_DB}
10+
ENV POSTGRES_USER=${POSTGRES_USER}
11+
ENV POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
12+
13+
# Copy dump into builder stage
14+
COPY ${DATABASE_DUMP_FILE} /tmp/soil_id_db.dump
15+
16+
# Initialize database and preload it
17+
RUN mkdir -p /var/lib/postgresql/data \
18+
&& chown -R ${POSTGRES_USER}:${POSTGRES_USER} /var/lib/postgresql
19+
20+
USER ${POSTGRES_USER}
21+
22+
# Initialize database system
23+
RUN /usr/lib/postgresql/16/bin/initdb -D /var/lib/postgresql/data
24+
25+
# Start database in background, preload dump, shut it down
26+
RUN pg_ctl -D /var/lib/postgresql/data -o "-c listen_addresses=''" -w start \
27+
&& createdb -U ${POSTGRES_USER} ${POSTGRES_DB} \
28+
&& psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "CREATE EXTENSION postgis;" \
29+
&& pg_restore -U ${POSTGRES_USER} -d ${POSTGRES_DB} /tmp/soil_id_db.dump \
30+
&& psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "CLUSTER hwsd2_segment USING hwsd2_segment_shape_idx;" \
31+
&& psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "ANALYZE;" \
32+
&& pg_ctl -D /var/lib/postgresql/data -m fast -w stop
33+
34+
# Stage 2: Final image with loaded data
35+
FROM postgis/postgis:16-3.5
36+
37+
COPY --from=builder /var/lib/postgresql/data /var/lib/postgresql/data
38+
39+
# Create a Docker-friendly pg_hba.conf
40+
USER root
41+
RUN echo "local all all trust" > /var/lib/postgresql/data/pg_hba.conf \
42+
&& echo "host all all 127.0.0.1/32 trust" >> /var/lib/postgresql/data/pg_hba.conf \
43+
&& echo "host all all ::1/128 trust" >> /var/lib/postgresql/data/pg_hba.conf \
44+
&& echo "host all all 0.0.0.0/0 trust" >> /var/lib/postgresql/data/pg_hba.conf \
45+
&& chown ${POSTGRES_USER}:${POSTGRES_USER} /var/lib/postgresql/data/pg_hba.conf
46+
47+
ENV POSTGRES_DB=${POSTGRES_DB}
48+
ENV POSTGRES_USER={POSTGRES_USER}
49+
ENV POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
50+
51+
USER ${POSTGRES_USER}

Makefile

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,13 +54,33 @@ graphs:
5454
gprof2dot -f pstats prof/test_soil_location.prof | dot -Tsvg -o prof/test_soil_location.svg
5555
flameprof prof/test_soil_location.prof > prof/test_soil_location_flame.svg
5656

57-
generate_bulk_test_results:
57+
generate_bulk_test_results_us:
5858
python -m soil_id.tests.us.generate_bulk_test_results
5959

60-
process_bulk_test_results:
60+
process_bulk_test_results_us:
6161
python -m soil_id.tests.us.process_bulk_test_results $(RESULTS_FILE)
6262

63+
generate_bulk_test_results_global:
64+
python -m soil_id.tests.global.generate_bulk_test_results
65+
66+
process_bulk_test_results_global:
67+
python -m soil_id.tests.global.process_bulk_test_results $(RESULTS_FILE)
68+
69+
generate_bulk_test_results_legacy:
70+
python -m soil_id.tests.legacy.generate_bulk_test_results
71+
72+
process_bulk_test_results_legacy:
73+
python -m soil_id.tests.legacy.process_bulk_test_results $(RESULTS_FILE)
74+
6375
# Donwload Munsell CSV, SHX, SHP, SBX, SBN, PRJ, DBF
76+
# 1tN23iVe6X1fcomcfveVp4w3Pwd0HJuTe: LandPKS_munsell_rgb_lab.csv
77+
# 1WUa9e3vTWPi6G8h4OI3CBUZP5y7tf1Li: gsmsoilmu_a_us.shx
78+
# 1l9MxC0xENGmI_NmGlBY74EtlD6SZid_a: gsmsoilmu_a_us.shp
79+
# 1asGnnqe0zI2v8xuOszlsNmZkOSl7cJ2n: gsmsoilmu_a_us.sbx
80+
# 185Qjb9pJJn4AzOissiTz283tINrDqgI0: gsmsoilmu_a_us.sbn
81+
# 1P3xl1YRlfcMjfO_4PM39tkrrlL3hoLzv: gsmsoilmu_a_us.prj
82+
# 1K0GkqxhZiVUND6yfFmaI7tYanLktekyp: gsmsoilmu_a_us.dbf
83+
# 1z7foFFHv_mTsuxMYnfOQRvXT5LKYlYFN: SoilID_US_Areas.shz
6484
download-soil-data:
6585
mkdir -p Data
6686
cd Data; \
@@ -70,4 +90,35 @@ download-soil-data:
7090
gdown 1asGnnqe0zI2v8xuOszlsNmZkOSl7cJ2n; \
7191
gdown 185Qjb9pJJn4AzOissiTz283tINrDqgI0; \
7292
gdown 1P3xl1YRlfcMjfO_4PM39tkrrlL3hoLzv; \
73-
gdown 1K0GkqxhZiVUND6yfFmaI7tYanLktekyp \
93+
gdown 1K0GkqxhZiVUND6yfFmaI7tYanLktekyp; \
94+
gdown 1z7foFFHv_mTsuxMYnfOQRvXT5LKYlYFN \
95+
96+
DATABASE_DUMP_FILE ?= Data/soil_id_db.dump
97+
DOCKER_IMAGE_TAG ?= ghcr.io/techmatters/soil-id-db:latest
98+
build_docker_image:
99+
@echo "Building to tag $(DOCKER_IMAGE_TAG)"
100+
docker build \
101+
--build-arg DATABASE_DUMP_FILE=$(DATABASE_DUMP_FILE) \
102+
-t $(DOCKER_IMAGE_TAG) \
103+
.
104+
105+
push_docker_image:
106+
@echo "Pushing tag $(DOCKER_IMAGE_TAG). Make sure to provide a versioned tag in addition to updating latest!"
107+
docker push $(DOCKER_IMAGE_TAG)
108+
109+
start_db:
110+
docker compose up -d
111+
112+
stop_db:
113+
docker compose down
114+
115+
connect_db:
116+
docker compose exec db psql -U postgres -d soil_id
117+
118+
dump_soil_id_db:
119+
pg_dump --format=custom $(DATABASE_URL) -t hwsd2_segment -t hwsd2_data -t landpks_munsell_rgb_lab -t normdist1 -t normdist2 -t wise_soil_data -t wrb2006_to_fao90 -t wrb_fao90_desc -f $(DATABASE_DUMP_FILE)
120+
121+
restore_soil_id_db:
122+
pg_restore --dbname=$(DATABASE_URL) --single-transaction --clean --if-exists --no-owner $(DATABASE_DUMP_FILE)
123+
psql $(DATABASE_URL) -c "CLUSTER hwsd2_segment USING hwsd2_segment_shape_idx;"
124+
psql $(DATABASE_URL) -c "ANALYZE;"

docker-compose.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
name: soil_id
2+
3+
services:
4+
db:
5+
image: ghcr.io/techmatters/soil-id-db:0.2
6+
ports:
7+
- "5432:5432"
8+
environment:
9+
POSTGRES_USER: postgres
10+
POSTGRES_PASSWORD: postgres
11+
POSTGRES_DB: soil_id # default database

soil_id/config.py

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,42 +12,32 @@
1212
#
1313
# You should have received a copy of the GNU Affero General Public License
1414
# along with this program. If not, see https://www.gnu.org/licenses/.
15-
1615
import os
1716
import tempfile
1817

19-
from dotenv import load_dotenv
2018
from platformdirs import user_cache_dir
2119

22-
load_dotenv()
23-
2420
DATA_PATH = os.environ.get("DATA_PATH", "Data")
2521

2622
# Numpy seeding
2723
RANDOM_SEED = os.environ.get("RANDOM_SEED", 19)
2824

2925
# Output
3026
APP_NAME = os.environ.get("APP_NAME", "org.terraso.soilid")
31-
TEMP_DIR = tempfile.TemporaryDirectory(delete=False)
27+
TEMP_DIR = tempfile.TemporaryDirectory()
3228
CACHE_DIR = user_cache_dir(APP_NAME)
3329
OUTPUT_PATH = TEMP_DIR.name
3430
SOIL_ID_RANK_PATH = f"{OUTPUT_PATH}/soil_id_rank.csv"
3531
SOIL_ID_PROB_PATH = f"{OUTPUT_PATH}/soil_id_cond_prob.csv"
3632
REQUESTS_CACHE_PATH = f"{CACHE_DIR}/requests_cache"
3733

3834
# Determines if in/out of US
39-
US_AREA_PATH = f"{DATA_PATH}/SoilID_US_Areas.shp"
35+
US_AREA_PATH = f"{DATA_PATH}/SoilID_US_Areas.shz"
4036

4137
# US Soil ID
4238
STATSGO_PATH = f"{DATA_PATH}/gsmsoilmu_a_us.shp"
4339
MUNSELL_RGB_LAB_PATH = f"{DATA_PATH}/LandPKS_munsell_rgb_lab.csv"
4440

45-
# Global Soil ID
46-
HWSD_PATH = f"{DATA_PATH}/HWSD_global_noWater_no_country.shp"
47-
WISE_PATH = f"{DATA_PATH}/wise30sec_poly_simp_soil.shp"
48-
NORM_DIST_1_PATH = f"{DATA_PATH}/NormDist1.csv"
49-
NORM_DIST_2_PATH = f"{DATA_PATH}/NormDist2.csv"
50-
5141
# Database
5242
DB_NAME = os.environ.get("DB_NAME", "terraso_backend")
5343
DB_HOST = os.environ.get("DB_HOST")

0 commit comments

Comments
 (0)