Research code package for Underground Utility Network Completion based on Spatial Contextual Information of Ground Facilities and Utility Anchor Points using Graph Neural Networks.
The repository includes an installable Python package, command-line evaluation scripts, environment checks, tests, documentation, data-layout guidance, and branch-based source traceability to the notebook-era repository state.
The initial version of this repository was published on December 26, 2023, and the codebase was refactored by Codex and Claude Code on May 13, 2026.
Caution
Data and model checkpoints are not distributed in this repository. Raw GIS files, processed graph pickles, model checkpoints, and derived artifacts that can reconstruct the utility network are excluded. Users must obtain source data from the original public providers and follow their terms; the authors cannot redistribute raw or derived data without separate written permission from the providers.
.\scripts\create_env.ps1
.\.conda\pipe-network-completion\python.exe -m pip install -e .
.\.conda\pipe-network-completion\python.exe scripts\verify_environment.pyconda env create -f environment.yml
conda activate pipe-network-completion
pip install -e .
python scripts/verify_environment.pyEditable installation keeps pipe_network_completion importable from scripts,
notebooks, and tests.
Run this only after the local graph data, checkpoint, and metrics files are available under the documented paths:
python scripts/evaluate_checkpoint.py \
--checkpoint models/checkpoints/model1212_hiddensize_128_drop_00.pt \
--metrics results/metrics/model_metrics1212.csv \
--split testSupporting documentation: docs/TRACEABILITY.md maps refactored modules to the notebook workflow and archived branches; docs/DATA_LAYOUT.md documents artifact locations; and models/README.md explains the checkpoint naming scheme.
The command-line workflow follows the artifact lifecycle:
data/raw/
gis/sewer/ Urban Utilities sewer shapefile bundles
gis/roads/ Brisbane City Council road shapefile bundle
mh_road/MH_Road.pkl local manhole-road nearest-feature table
-> process.py
-> data/interim/*.pkl
-> scripts/build_graphs.py
-> data/processed/graphs/{train,val,test}_data.pkl
+ models/checkpoints/*.pt
+ results/metrics/*.csv
-> scripts/evaluate_checkpoint.py
-> checkpoint metrics
-
Preprocess GIS shapefiles and the manhole-road near table into
*_proc.pkl+split_mask.pkl:python process.py
-
Assemble per-split HeteroData graphs from those pickles:
python scripts/build_graphs.py
-
Evaluate a saved checkpoint against the published metrics:
python scripts/evaluate_checkpoint.py
Each script supports --help.
- process.py: raw GIS preprocessing script parameterized for local repo paths.
- scripts/build_graphs.py: assembles
train_data.pkl,val_data.pkl,test_data.pklfrom the interim pickles. - pipe_network_completion/dataset.py: graph dataset construction utilities.
- pipe_network_completion/model.py: importable GNN model definition refactored from the notebook.
- pipe_network_completion/evaluation.py: shared binary classification metrics.
- data/: local layout for raw, interim, processed, and experiment data artifacts generated from provider-supplied inputs.
- models/checkpoints/: saved PyTorch model checkpoints for local evaluation; checkpoint files are not distributed in this repository.
- results/metrics/: recorded metrics from previous model runs.
The main branch is the maintained research code package. Notebook-era source
code and earlier repository layouts are preserved in sanitized archived remote
branches, so readers can inspect the historical code without adding data files
to the current runnable tree.
git fetch --all --tags
git branch -r
git switch --detach origin/Legacy-FinalUse origin/Legacy-Final for the previous final research state and
origin/Legacy-main for the earlier main-branch snapshot. Return to the current
codebase with:
git switch mainThe source code is public. Raw GIS files, processed graph pickles, model checkpoints, and other artifacts that can reconstruct the utility network are not redistributed in this repository. Users should obtain source GIS data directly from the relevant public data providers and follow their terms of use. The authors do not redistribute raw or derived data artifacts unless separate written permission is obtained from the relevant data providers.
Place each shapefile bundle in the folder below. Keep every .shp file with
its matching sidecars, including .dbf, .shx, .prj, and any other files
exported with the same base name.
| File placement | Dataset / source | Type | Role in the workflow |
|---|---|---|---|
data/raw/gis/sewer/SewerManholes_ExportFeatures.shp |
Urban Utilities sewer manholes |
Point | Main manhole/anchor-point layer. Combined with MH_Road.pkl to build MH_proc.pkl. |
data/raw/gis/sewer/SewerGravityMa_ExportFeature1.shp |
Urban Utilities sewer gravity main - trunk |
Line | Trunk gravity-main segments. Combined with SewerGravityMa_ExportFeature2.shp to build Line_proc.pkl. |
data/raw/gis/sewer/SewerGravityMa_ExportFeature2.shp |
Urban Utilities sewer gravity main |
Line | Main gravity sewer segments. Combined with SewerGravityMa_ExportFeature1.shp to build Line_proc.pkl. |
data/raw/gis/sewer/SewersqlSewerP_ExportFeature.shp |
Urban Utilities sewer pump assets |
Point | Loaded by process.py as pump point assets. In the current preprocessing path, pump rows without the manhole-road near-table fields are filtered before MH_proc.pkl is written. |
data/raw/gis/roads/Roads_ExportFeatures.shp |
Brisbane City Council Open Data road hierarchy |
Line | Road context layer used to build road nodes and road-road relationships. |
data/raw/mh_road/MH_Road.pkl |
Locally generated manhole-road near table |
Table | Links manholes to nearby roads. Expected fields are OBJECTID, NEAR_FID, NEAR_POS, NEAR_DIST, and SIDE. |
If pump assets should be represented as graph nodes, revise process.py and
regenerate the derived artifacts; otherwise the table above documents the
current notebook-compatible preprocessing path.
Generated local artifacts:
data/interim/MH_proc.pkldata/interim/Road_proc.pkldata/interim/MH_R_RL_proc.pkldata/interim/Line_proc.pkldata/interim/R_R_proc.pkldata/interim/split_mask.pkldata/processed/graphs/train_data.pkldata/processed/graphs/val_data.pkldata/processed/graphs/test_data.pkl
Public GitHub releases should not attach these artifacts unless redistribution permission is documented.
Readers can use the repository at two levels:
- Code and environment check: clone the repository, create the environment,
install the package, and run
scripts/verify_environment.py. This path does not require raw GIS files or model artifacts. - Provider-data workflow: obtain GIS data directly from the relevant public
data providers, follow their terms of use, place the files in the documented
data/raw/layout, then runprocess.pyandscripts/build_graphs.py. Model evaluation requires locally generated graph artifacts and a checkpoint the reader is permitted to use.
The pytest suite checks imports, path constants, and the checkpoint/metrics inventory:
pip install -e ".[test]"
pytestArtifact-backed validation for local provider data is handled by
scripts/verify_environment.py --load-data.
Tagged source releases are published at github.com/Yuxi0048/PipeNetworkCompletion/releases. Public releases contain source code and documentation. Data archives are not provided through this repository. The release procedure is documented in RELEASE.md, and the version history lives in CHANGELOG.md.
Evaluation uses LinkNeighborLoader with finite neighborhood sampling, matching
the notebook workflow. The checkpoint evaluation script sets Python, NumPy, and
PyTorch seeds and reports observed metrics beside the published metrics row.
The archived notebook on origin/Legacy-Final records exploratory architecture
variants. The maintained CLI focuses on checkpoint evaluation with the prepared
graph artifacts.
This project appreciates Urban Utilities for public access to high-quality utility network data, and Brisbane City Council Open Data for public geospatial context used alongside those utility assets. Users are responsible for following provider terms when accessing or redistributing raw or derived data.
@inproceedings{10.22260/ISARC2024/0121,
doi = {10.22260/ISARC2024/0121},
year = {2024},
month = {June},
author = {Zhang, Yuxi and Cai, Hubo},
title = {Underground Utility Network Completion based on Spatial Contextual Information of Ground Facilities and Utility Anchor Points using Graph Neural Networks},
booktitle = {Proceedings of the 41st International Symposium on Automation and Robotics in Construction},
isbn = {978-0-6458322-1-1},
issn = {2413-5844},
publisher = {International Association for Automation and Robotics in Construction (IAARC)},
pages = {936-943},
address = {Lille, France}
}Contact: Yuxi Zhang, zhan2889@purdue.edu
The location encoder implementation is adapted from the space2vec/grid-cell spatial representation work:
@inproceedings{space2vec_iclr2020,
title = {Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells},
author = {Mai, Gengchen and Janowicz, Krzysztof and Yan, Bo and Zhu, Rui and Cai, Ling and Lao, Ni},
booktitle = {The Eighth International Conference on Learning Representations},
year = {2020},
organization = {OpenReview}
}