Update efficacy experiment.

Modify figure visualization (add dpi, hide axis titles, etc) Add README
pvti · Oct 2, 2023 · bc1b00c · bc1b00c
1 parent 86e44bc
commit bc1b00c
Show file tree

Hide file tree

Showing 18 changed files with 99 additions and 14 deletions.
diff --git a/assets/kmeans/gt1.png b/assets/kmeans/gt1.png
diff --git a/assets/kmeans/gt2.png b/assets/kmeans/gt2.png
diff --git a/assets/kmeans/gt3.png b/assets/kmeans/gt3.png
diff --git a/assets/kmeans/mt1.png b/assets/kmeans/mt1.png
diff --git a/assets/kmeans/mt2.png b/assets/kmeans/mt2.png
diff --git a/assets/kmeans/mt3.png b/assets/kmeans/mt3.png
diff --git a/assets/kmeans/ts1.png b/assets/kmeans/ts1.png
diff --git a/assets/kmeans/ts2.png b/assets/kmeans/ts2.png
diff --git a/assets/kmeans/ts3.png b/assets/kmeans/ts3.png
diff --git a/experiments/README.md b/experiments/README.md
@@ -0,0 +1,85 @@
+# CORING - Comparative Efficacy Study
+
+This folder contains the code and datasets used in the comparative efficacy study of the CORING framework. In this study, we assess the performance of tensor-based and matrix-based methodologies using the K-means clustering algorithm with custom distance metrics.
+
+## Experimental Setup
+
+### Datasets
+
+We generated a set of synthetic datasets characterized by:
+- Total datasets (D): 1000
+- Clusters per dataset (c): 5
+- Centroid tensor dimensions (C_in x h x w): 64 x 3 x 3
+- Number of satellite tensors (n) per cluster: 100
+
+### Dataset Complexity
+
+Dataset complexity is governed by the standard deviations of centroids (σ_centroids) and satellite tensors (σ_satellites). We designed datasets with three difficulty levels:
+
+1. Normal:
+   - σ_centroids: [1.7, 1.8]
+   - σ_satellites: [0.2, 0.3]
+
+2. Moderate:
+   - σ_centroids: [1.5, 2.0]
+   - σ_satellites: [0.1, 0.3]
+
+3. Hard:
+   - σ_centroids: [1.0, 2.0]
+   - σ_satellites: [0.1, 0.5]
+
+## Evaluation
+
+We evaluated the K-means algorithm's performance using the Adjusted Rand Index (ARI), a measure of similarity between ground truth and predictions. Higher ARI values (α in [-1, 1]) indicate superior predictive accuracy, with α = 0 implying predictions are no better than random chance concerning ground truth.
+
+To account for K-means initialization sensitivity, we performed multiple initializations (i) and selected the one yielding the best inertia value, defined as the sum of distances between samples and their corresponding centroids, upon convergence, following sklearn guidelines.
+
+## Experimental Results
+
+The mean ARIs, as presented in Table 1, unequivocally establish the superior performance of the tensor-based approach. Both approaches exhibit proficiency on less intricate datasets; however, in scenarios where dataset complexity escalates, our tensor-based methodology consistently demonstrates heightened consistency and effectiveness. This enhanced performance can be attributed to our method's ability to preserve the multidimensionality of the filters, allowing it to capture and retain crucial information effectively.
+
+Table 1: ARI of the tensor-based and matrix-based method
+
+| Method | Normal | Moderate | Hard |
+|--------|--------|----------|------|
+| Matrix | 0.83   | 0.61     | 0.54 |
+| Tensor | **0.91** | **0.82** | **0.73** |
+
+We have included representative datasets based on the ARI of predictions that closely align with the mean ARI across the entire dataset (3 scenarios, each containing 1000 datasets). These representative datasets are employed to illustrate typical outcomes of both methods, as presented in Table 2.
+
+### Table 2: Visualizing the performance of tensor-based and matrix-based methods
+
+|   | Groundtruth | Matrix  | Tensor  |
+|---|:-------------:|:------------------:|:------------------:|
+| **Normal**   | ![Groundtruth normal](../assets/kmeans/gt1.png) | ![Matrix ARI=0.78](../assets/kmeans/mt1.png) | ![Tensor ARI=0.99](../assets/kmeans/ts1.png) |
+| **Moderate** | ![Groundtruth moderate](../assets/kmeans/gt2.png) | ![Matrix ARI=0.56](../assets/kmeans/mt2.png) | ![Tensor ARI=0.85](../assets/kmeans/ts2.png) |
+| **Hard**     | ![Groundtruth hard](../assets/kmeans/gt3.png) | ![Matrix ARI=0.48](../assets/kmeans/mt3.png) | ![Tensor ARI=0.72](../assets/kmeans/ts3.png) |
+
+In Table 2, we provide visualizations of the performance of both tensor-based and matrix-based methods on representative datasets of different difficulty levels. These visualizations clearly demonstrate the superiority of the tensor-based method, as indicated by higher ARI values.
+
+
+## Reproducing Results
+
+To reproduce the experimental results, follow these steps:
+
+1. Generate the datasets using `data.py`:
+   ```bash
+   python data.py --data your_datapath
+   ```
+   Please replace `your_datapath` with the appropriate path to your dataset, e.g, `normal-699.npy`.
+2. Run experiments on the datasets:
+   - For normal dataset:
+   ```bash
+   python kmeans.py --data normal-699.npy --seed 91 --method matrix
+   python kmeans.py --data normal-699.npy --seed 1
+   ```
+   - For moderate dataset:
+   ```bash
+   python kmeans.py --data moderate-65.npy --seed 66 --method matrix
+   python kmeans.py --data moderate-65.npy --seed 33
+   ```
+   - For hard dataset:
+   ```bash
+   python kmeans.py --data hard-404.npy --seed 66 --method matrix
+   python kmeans.py --data hard-404.npy --seed 19
+   ```
diff --git a/experiments/data.py b/experiments/data.py
@@ -86,7 +86,7 @@ def main():
     colors = plt.cm.tab10(np.linspace(0, 1, args.centroids))
 
     # Plot the points in a 2D plane with different colors for each centroid
-    plt.figure(figsize=(8, 6))
+    plt.figure(figsize=(8, 6), dpi=300)
     for i in range(args.centroids):
         # Plot the closely similar filters of the current centroid (cluster)
         plt.scatter(
@@ -109,15 +109,15 @@ def main():
     )
 
     # Set plot labels and title
-    plt.xlabel("Principal Component 1")
-    plt.ylabel("Principal Component 2")
-    plt.title("2D PCA Projection. Ground truth")
+    # plt.xlabel("Principal Component 1")
+    # plt.ylabel("Principal Component 2")
+    # plt.title(f"Ground truth: {args.data}")
 
     # Add legend
-    plt.legend()
+    plt.legend(prop = { "size": 14 })
 
     # Set aspect ratio to equal and grid on
-    plt.axis("equal")
+    # plt.axis("equal")
     plt.grid(True)
     plt.show()
 

diff --git a/experiments/dataset_1.npy b/experiments/dataset_1.npy
diff --git a/experiments/dataset_2.npy b/experiments/dataset_2.npy
diff --git a/experiments/dataset_3.npy b/experiments/dataset_3.npy
diff --git a/experiments/hard-404.npy b/experiments/hard-404.npy
diff --git a/experiments/kmeans.py b/experiments/kmeans.py
@@ -209,7 +209,7 @@ def compute_inter_distance(centroids):
     colors = plt.cm.tab10(np.linspace(0, 1, num_clusters))
 
     # Plot the points in a 2D plane with different colors for each cluster
-    plt.figure(figsize=(8, 6))
+    plt.figure(figsize=(8, 6), dpi=300)
     for i in range(num_clusters):
         plt.scatter(
             pca_result[labels == i, 0],
@@ -230,16 +230,16 @@ def compute_inter_distance(centroids):
     )
 
     # Set plot labels and title
-    plt.xlabel("Principal Component 1")
-    plt.ylabel("Principal Component 2")
-    plt.title(
-        f"data = {args.data}; method = {args.method}; distance = {args.distance}; rank = {args.rank}"
-    )
+    # plt.xlabel("Principal Component 1")
+    # plt.ylabel("Principal Component 2")
+    # plt.title(
+    #     f"data = {args.data}; method = {args.method}; ARI = {ari}"
+    # )
 
     # Add legend
-    plt.legend()
+    # plt.legend(prop = { "size": 13 })
 
     # Set aspect ratio to equal and grid on
-    plt.axis("equal")
+    # plt.axis("equal")
     plt.grid(True)
     plt.show()
diff --git a/experiments/moderate-65.npy b/experiments/moderate-65.npy
diff --git a/experiments/normal-699.npy b/experiments/normal-699.npy