upd readme and gini result

Chronosymbolic · Chronosymbolic · commit 6d6fa781601e · 2023-09-14T09:30:12.000-04:00
diff --git a/README.md b/README.md
@@ -45,24 +45,22 @@ Python (3.7.0 or higher, and [Anaconda](https://www.anaconda.com/) recommended)
 
     - Specify the result summary log file using `-o FILE_NAME`; Export an additional result summary CSV `FILE_NAME_prefix.csv` (with success and timing statistics) using `-a`; The summary is only available when running multiple instances (directory mode or file list mode)
 
-    - Start testing from the file index k in the folder `-s K` (`K` is the index starting from zero)
+    - Start solving from the file index `K` in the folder `-s K` (`K` is the index starting from zero)
 
     - If you want to run multiple instances, make sure to use different `FILE_NAME`-s in the config file to avoid clash (`config.yml` in default)
 
     - More options see `--help`
 
 - After finishing running, the `./tmp` directory can be deleted safely
 
-# To reproduce Chronosymbolic-single
+# To reproduce the major result: Chronosymbolic-single
 
-Please refer to the configuration in `./experiment/result_summary.log`. Using the default config in `config.yml` should also be decent. Even fixed random seeds can cause minor randomness that may slightly affect the performance.
+Please refer to the configuration in `./experiment/result_summary.log` and `./experiment/README.md` (where settings for other minor experiments are also provided). Using the default config in `config.yml` should also be decent. Even fixed random seeds can cause minor randomness that may slightly affect the performance.
 
 - `python test.py -f tests/safe -a -r -v -t 360 -o result/result_safe.log`
 
 - `python test.py -f tests/unsafe -a -r -v -t 360 -o result/result_unsafe.log`
 
-- `python test.py -f tests/multiple_pred -a -r -v -t 360 -o result/result_multi.log`
-
 
 # To run the baselines
 ## Spacer and GSpacer
diff --git a/experiment/README.md b/experiment/README.md
@@ -21,7 +21,18 @@ The specifications of the device used to generate this log:
 
 
 ## comparison.xlsx
-Detailed running time data on our major performance evaluation in the experiment section.
+Detailed running time data on our major performance evaluation in the experiment section of our paper.
 
 ## result_rnd_seed.xlsx
-Detailed running time data on our performance evaluation with different random seeds is described in our Appendix.
+Detailed running time data on our performance evaluation with different random seeds is described in the Appendix of the paper. We only show safe instances as an example. In `result_rnd_seed_gini.xlsx`, the only difference is to use Gini impurity instead of Shannon Entropy in DTs.
+
+## To reproduce Chronosymbolic-cover
+Unfortunately due to the incompleteness of our logging system then, the hyperparameters of the 13 runs are not fully recorded. We provide essential experiments needed to run to reproduce the result:
+
+1. Different strategies on scheduling the candidate hypothesis in Table 1 of our paper (e.g., tuning the hyperparameters in `SafeZoneUsage: '(self.total_iter // 200) % 2 == 0', UnsafeZoneUsage: '(self.total_iter // 100) % 2 == 0`);
+2. Using different DT settings (may try random DT as well that may not work well overall but works on some specific instances) and Agents (set in `./test.py`);
+3. Different dataset settings (enable the queue mode or not, how many samples should the datasets keep);
+4. Different expansion strategy for the reasoner (`Expansion` in `./config.yml`).
+
+## To reproduce results of CHC-COMP-22-LIA
+The default settings in `./config.yml` should be decent (might be a little bit worse than the best result but within 5%). Note that, if the timeout malfunctions for some instances, interrupt the tool manually and use the `-s K` option starting from the file index `K` in the folder.
diff --git a/experiment/result_rnd_seed_gini.xlsx b/experiment/result_rnd_seed_gini.xlsx