This is the official implementation of "SPAtio-Temporal graph System (SPATS)" in the following paper:
- Yoon, Heeyong, et al. "SPATS: A practical system for comparative analysis of Spatio-Temporal Graph Neural Networks." Journal Name Here (2025): Issue and Volume Here.
If you use this repository in your research project, please cite the following BiBTeX in your paper:
(TBD)
⚠ This is not a commercial system, so we do not guarantee its behavior in all environments. Please handle exceptional cases with appropriate external knowledge, official documentation, and experience.
⚠ Some parts contain template text; do not copy and paste them directly, and make sure to replace the template text appropriately.
⚠ Because NFS recognizes users only by ID, it is strongly recommended to set all nodes' user and group IDs to the same value. Otherwise, unexpected behavior can occur (search about the Linux command
id
,usermod
orgroupmod
).
-
Prepare your GPU cluster and set one node as master server and other nodes as worker server. The master server and worker server roles can be overlapped, but separating the roles is recommended because CPU usage might affect model training.
Our testing environment for each node:
-
Ubuntu 18.04
-
CUDA 11.3 to 12.1 (different per node)
-
Python 3.10 (installed by miniconda) and necessary packages (PyTorch, Numpy, ...)
-
-
Install
NFS
server on the master servera. Make the repository visible to anyone.
sudo apt install nfs-kernel-server -y # install NFS server
b. Modify
/etc/exports
.sudo vim /etc/exports
c. Write the following text line at the bottom of the
/etc/exports/
./your_master_server_folder/SPATS/ *(rw,sync,no_subtree_check)
d. Restart
NFS
.sudo exportfs -a sudo systemctl restart nfs-kernel-server
e. Download this repository on the master server.
cd /your_master_server_folder/ git clone https://github.com/sunrise2575/SPATS
f. Make the repository visible to anyone.
sudo chmod 777 ./SPATS/ # Make the repository visible to anyone
-
Install
NFS
client and mount the remote folder on each worker servera. Install
NFS
client.sudo apt install nfs-common -y
b. Mount the master server's folder.
mkdir -p /your_worker_server_folder/SPATS/ mount <master_server_IP>:/your_master_server_folder/SPATS /your_worker_server_folder/SPATS/
For checking IP, use
ip a
orifconfig -a
command.c. Make sure the remote folder is mounted well.
ls -al /your_worker_server_folder/SPATS/
-
Install
Python
dependencies on every server (all master server and worker server)conda activate base # or specify different virtualenv pip install -r requirements.txt
Some Python packages in
requirements.txt
are not version-sensitive, so you can selectively modifyrequirements.txt
not to upgrade or downgrade your existing packages.
-
Launch Broker and Worker
a. Launch Broker process on master server
python ./broker.py --port <broker_port>
The default port number is
9999
; if you do not specify the port number, you can type like this:python ./broker.py
b. Launch Worker process on each worker server
python ./worker.py --broker <broker_IP>:<broker_port> --gpu <GPU_indices_as_you_wish> # python ./worker.py --broker 1.2.3.4:9999 --gpu 0,2,5 # example; using 0,2,5-th GPU only
We recommend using
tmux
to manage processes on each node efficiently. -
Insert jobs to Broker
⚠ The following content may be lengthy and complex. Since this was developed for research purposes and aims to execute queries in Python without using a structured language like SQL, the details can be intricate. It is recommended to read through carefully and learn by using
query-bulk-insert.py
yourself.a. Find the following text lines in
query.py
andquery-bulk-insert.py
BROKER_IP='127.0.0.1' BROKER_PORT='9999'
and replace it with the IP and port number of your Broker.
BROKER_IP=<broker_IP> BROKER_PORT=<broker_port>
b. Insert multiple jobs using
query-bulk-insert.py
In
query-bulk-insert.py
, the variableDEFAULT
is a default job setting. By changing some values ofDEFAULT
andVARIATION
,query-bulk-insert.py
generates thousands of combinations of jobs.The following example shows how to use
VARIATION
properly.VARIATION = { 'maxEpoch': [10], 'batchSize': [64, 128], # 2 variations 'datasetName': ['METR-LA', 'PEMS-BAY'], # 2 variations 'modelName': ['Seq2Seq_RNN', 'ASTGCN'], # 2 variations 'adjacencyMatrixThresholdValue': [float('0.' + str(i)) for i in range(2)], # 2 variations # -> 2x2x2x2=16 variations }
In this case, SPATS sets
maxEpoch
to10
, whilebatchSize
to64
and128
both. The candidatedatasetName
andmodelName
work similarly. You can write Python generator syntax onVARIATION
such asadjacencyMatrixThresholdValue
of the example.On the other hand,
DATASET_DEPENDENT_SETTING
andMODEL_DEPENDENT_SETTING
inquery-bulk-insert.py
are variables to help change other values inDEFAULT
comfortably, which are affected bydatasetName
andmodelName
inVARIATION
, respectively. For example, the aboveVARIATION
example specifies that candidate datasets areMETR-LA
andPEMS-BAY
, following items inDATASET_DEPENDENT_SETTING
are selected,DATASET_DEPENDENT_SETTING={ ... 'METR-LA': { 'additionalTemporalEmbed': ['timestamp_in_day'], # 1 variation 'targetSensorAttributes': [['speed']], # 1 variation 'inputLength': [12], 'outputLength': [12], # 1 variation each # -> 1x1x1x1=1 variation }, 'PEMS-BAY': { 'additionalTemporalEmbed': ['timestamp_in_day'], # 1 variation 'targetSensorAttributes': [['speed']], # 1 variation 'inputLength': [12], 'outputLength': [12], # 1 variation each # -> 1x1x1x1=1 variation }, ... }
Similarly, the following items in
MODEL_DEPENDENT_SETTING
are selected because themodelName
is['Seq2Seq_RNN', 'ASTGCN']
.MODEL_DEPENDENT_SETTING={ ... 'Seq2Seq_RNN': { 'adjacencyMatrixLaplacianMatrix': [None] }, # 1 variation 'ASTGCN': { 'adjacencyMatrixLaplacianMatrix': ['cheb_poly'], }, # 1 variation ... }
If the default value is like this:
DEFAULT={ 'trainTestRatio': 0.7, 'adjacencyMatrixThresholdValue': 0.7, 'maxEpoch': 100, 'batchSize': 64, 'lossFunction': ["MAE", "MSE", "MAAPE"], # this should be list; 1 variation 'targetLossFunction': "MSE", # about optimizer 'optimizer': 'Adam', 'learningRate': 0.001, 'weightDecay': 0.0005, }
By the power of algorithm
stdin_generator()
and additional model configuration, which is stored in.yaml
files, 16 variations of jobs are generated, such asstdin = { 'datasetName': 'METR-LA', # set by VARIATION 'adjacencyMatrixThresholdValue': 0.0, # changed by VARIATION 'modelName': 'Seq2Seq_RNN', # set by VARIATION 'maxEpoch': 10, # changed by VARIATION 'batchSize': 64, # changed by VARIATION 'adjacencyMatrixLaplacianMatrix': None, # set by MODEL_DEPENDENT_SETTING 'modelConfig': {}, # automatically filled by common/model/<MODEL_NAME>.yaml. if it doesn't exist, it becomes an empty dict. 'additionalTemporalEmbeds': ['timestamp_in_day'], # set by DATASET_DEPENDENT_SETTING 'inputLength': 12, # set by DATASET_DEPENDENT_SETTING 'outputLength': 12, # set by DATASET_DEPENDENT_SETTING 'targetSensorAttributes': ['speed'], # set by DATASET_DEPENDENT_SETTING 'trainTestRatio': 0.7, # from DEFAULT 'learningRate': 0.001, # from DEFAULT 'weightDecay': 0.0005, # from DEFAULT }
or
stdin = { 'datasetName': 'PEMS-BAY', # set by VARIATION 'adjacencyMatrixThresholdValue': 0.1, # changed by VARIATION 'modelName': 'ASTGCN', # set by VARIATION 'maxEpoch': 10, # changed by VARIATION 'batchSize': 128, # changed by VARIATION 'adjacencyMatrixLaplacianMatrix': 'cheb_poly', # set by MODEL_DEPENDENT_SETTING 'modelConfig': { 'time_strides': 1, 'nb_block': 2 'nb_chev_filter': 64, 'nb_time_filter': 64 }, # automatically filled by common/model/<MODEL_NAME>.yaml. if it doesn't exist, it becomes an empty dict. 'additionalTemporalEmbeds': ['timestamp_in_day'], # set by DATASET_DEPENDENT_SETTING 'inputLength': 12, # set by DATASET_DEPENDENT_SETTING 'outputLength': 12, # set by DATASET_DEPENDENT_SETTING 'targetSensorAttributes': ['speed'], # set by DATASET_DEPENDENT_SETTING 'maxEpoch': 10, # changed by VARIATION 'batchSize': 128, # changed by VARIATION 'trainTestRatio': 0.7, # from DEFAULT 'learningRate': 0.001, # from DEFAULT 'weightDecay': 0.0005, # from DEFAULT }
and so on. you can add
print(stdin)
at the loop ofmain()
inquery-bulk-insert.py
to print generated stdin for better understanding and debuggingc. After
DEFAULT
andVARIATION
are ready, you can insert jobs and wait for the completion.python ./query-bulk-insert.py # run SPATS!
Broker works as a queue, so you can insert more jobs without waiting for the completion of previously inserted jobs.
-
Get the job information and delete job
a. Full job info. (only shows recently inserted 100 jobs)
python ./query.py list
b. Job info with filtered type (only shows recently inserted 100 jobs).
python ./query.py list <started|pending|success|failure>
c. Single job info.
python ./query.py select <job_id>
d. Delete a job.
python ./query.py delete <job_id>
-
Visualize results
a. Copy
broker.sqlite3
related files tovisualize/
foldercp broker.sqlite3* visualize/.
b. Run extractor
cd visualize/ python ./extractor.py broker.sqlite3 success
c. Run
1-replace-model-and-dataset.ipynb
and2-concat-results.ipynb
in order to getresult.csv
d. Use
make-fig*.ipynb
files to generate comparison results similar to our paper.