crocs-muni · J08nY · Feb 2, 2025 · Jan 16, 2025 · Jan 22, 2025 · Jan 22, 2025
diff --git a/README.md b/README.md
@@ -48,7 +48,7 @@ Most probably, you don't want to fully process the certification artifacts by yo
 ```python
 from sec_certs.dataset import CCDataset
 
-dset = CCDataset.from_web_latest() # now you can inspect the object, certificates are held in dset.certs
+dset = CCDataset.from_web() # now you can inspect the object, certificates are held in dset.certs
 df = dset.to_pandas()  # Or you can transform the object into Pandas dataframe
 dset.to_json(
     './latest_cc_snapshot.json')  # You may want to store the snapshot as json, so that you don't have to download it again

diff --git a/docs/api/dataset.md b/docs/api/dataset.md
@@ -5,26 +5,38 @@
     :no-members:
 ```
 
-This documentation doesn't provide full API reference for all members of `dataset` package. Instead, it concentrates on the Dataset that are immediately exposed to the users. Namely, we focus on `CCDataset`, `FIPSDataset` and their abstract base class `Dataset`.
+This documentation doesn't provide full API reference for all members of `dataset` package. Instead, it concentrates on the Dataset that are immediately exposed to the users.
+Namely, we focus on `CCDataset`, `FIPSDataset`, `ProtectionProfileDataset` and their abstract base class `Dataset`.
 
 ```{tip}
-The examples related to this package can be found in the [common criteria notebook](./../notebooks/examples/cc.ipynb) and the [fips notebook](./../notebooks/examples/fips.ipynb).
+The examples related to this package can be found in the [common criteria notebook](./../notebooks/examples/cc.ipynb),
+the [protection profile notebook](./../notebooks/examples/protection_profiles.ipynb), and the [fips notebook](./../notebooks/examples/fips.ipynb).
 ```
 
-## CCDataset
+## Base Dataset
 
 ```{eval-rst}
 .. currentmodule:: sec_certs.dataset.dataset
 .. autoclass:: Dataset
     :members:
 ```
 
+## CCDataset
+
 ```{eval-rst}
 .. currentmodule:: sec_certs.dataset
 .. autoclass:: CCDataset
     :members:
 ```
 
+## ProtectionProfileDataset
+
+```{eval-rst}
+.. currentmodule:: sec_certs.dataset
+.. autoclass:: ProtectionProfileDataset
+    :members:
+```
+
 ## FIPSDataset
 
 ```{eval-rst}

diff --git a/docs/api/sample.md b/docs/api/sample.md
@@ -17,10 +17,19 @@ The examples related to this package can be found in the [common criteria notebo
     :members:
 ```
 
+## ProtectionProfile
+
+```{eval-rst}
+.. currentmodule:: sec_certs.sample
+.. autoclass:: ProtectionProfile
+    :members:
+```
+
 ## FIPSCertificate
 
 ```{eval-rst}
 .. currentmodule:: sec_certs.sample
 .. autoclass:: FIPSCertificate
     :members:
-```
+```
+
diff --git a/docs/index.md b/docs/index.md
@@ -41,6 +41,7 @@ search_examples.md
 :maxdepth: 1
 Demo <notebooks/examples/est_solution.ipynb>
 Common Criteria <notebooks/examples/cc.ipynb>
+Protection Profiles <notebooks/examples/protection_profiles.ipynb>
 FIPS-140 <notebooks/examples/fips.ipynb>
 FIPS-140 IUT <notebooks/examples/fips_iut.ipynb>
 FIPS-140 MIP <notebooks/examples/fips_mip.ipynb>

diff --git a/docs/installation.md b/docs/installation.md
@@ -34,7 +34,7 @@ pip install -e .
 python -m spacy download en_core_web_sm
 ```
 
-Alternatively, our Our [Dockerfile](https://github.com/crocs-muni/sec-certs/blob/main/Dockerfile) represents a reproducible way of setting up the environment.
+Alternatively, our [Dockerfile](https://github.com/crocs-muni/sec-certs/blob/main/Dockerfile) represents a reproducible way of setting up the environment.
 
 :::
 ::::

diff --git a/docs/quickstart.md b/docs/quickstart.md
@@ -8,9 +8,9 @@
 ```python
 from sec_certs.dataset.cc import CCDataset
 
-dset = CCDataset.from_web_latest()
+dset = CCDataset.from_web()
 ```
-to obtain to obtain freshly processed dataset from [sec-certs.org](https://sec-certs.org).
+to obtain the freshly processed dataset from [sec-certs.org](https://sec-certs.org).
 
 3. Play with the dataset. See [example notebook](./notebooks/examples/cc.ipynb).
 :::
@@ -21,9 +21,9 @@ to obtain to obtain freshly processed dataset from [sec-certs.org](https://sec-c
 ```python
 from sec_certs.dataset.fips import FIPSDataset
 
-dset = FIPSDataset.from_web_latest()
+dset = FIPSDataset.from_web()
 ```
-to obtain to obtain freshly processed dataset from [sec-certs.org](https://sec-certs.org).
+to obtain the freshly processed dataset from [sec-certs.org](https://sec-certs.org).
 
 3. Play with the dataset. See [example notebook](./notebooks/examples/fips.ipynb).
 :::

diff --git a/docs/user_guide.md b/docs/user_guide.md
@@ -11,16 +11,16 @@ Our tool matches certificates to their possible CVEs using datasets downloaded f
 Our tool can seamlessly download the required NVD datasets when needed. We support two download mechanisms:
 
 1. Fetching datasets with the [NVD API](https://nvd.nist.gov/developers/start-here) (preferred way).
-1. Fetching snapshots from seccerts.org.
+1. Fetching snapshots from sec-certs.org.
 
 The following two keys control the behaviour:
 
 ```yaml
-preferred_source_nvd_datasets: "api" # set to "sec-certs" to fetch them from sec-certs.org
+preferred_source_remote_datasets: "origin" # set to "sec-certs" to fetch them from sec-certs.org
 nvd_api_key: null # or the actual key value
 ```
 
-If you aim to fetch the sources from NVD, we advise you to get an [NVD API key](https://nvd.nist.gov/developers/request-an-api-key) and set the `nvd_api_key` setting accordingly. The download from NVD will work even without API key, it will just be slow. No API key is needed when `preferred_source_nvd_datasets: "sec-certs"`
+If you aim to fetch the sources from NVD, we advise you to get an [NVD API key](https://nvd.nist.gov/developers/request-an-api-key) and set the `nvd_api_key` setting accordingly. The download from NVD will work even without API key, it will just be slow. No API key is needed when `preferred_source_remote_datasets: "sec-certs"`
 
 
 ## Inferring inter-certificate reference context

diff --git a/notebooks/cc/cert_id_eval.ipynb b/notebooks/cc/cert_id_eval.ipynb
@@ -64,7 +64,7 @@
    },
    "outputs": [],
    "source": [
-    "dset = CCDataset.from_web_latest()\n"
+    "dset = CCDataset.from_web()\n"
    ]
   },
   {

diff --git a/notebooks/cc/chain_of_trust_plots.ipynb b/notebooks/cc/chain_of_trust_plots.ipynb
@@ -1,9 +1,11 @@
 {
  "cells": [
   {
-   "metadata": {},
    "cell_type": "markdown",
-   "source": "# Plots from the \"Chain of Trust\" paper"
+   "metadata": {},
+   "source": [
+    "# Plots from the \"Chain of Trust\" paper"
+   ]
   },
   {
    "cell_type": "code",
@@ -79,7 +81,7 @@
     "figure_width = 3.5\n",
     "figure_height = 2.5\n",
     "\n",
-    "dset = CCDataset.from_web_latest()\n",
+    "dset = CCDataset.from_web()\n",
     "df = dset.to_pandas()"
    ]
   },

diff --git a/notebooks/cc/cpe_eval.ipynb b/notebooks/cc/cpe_eval.ipynb
@@ -18,7 +18,8 @@
     "from sec_certs.dataset import CCDataset\n",
     "import pandas as pd\n",
     "import json\n",
-    "import tempfile"
+    "import tempfile\n",
+    "from sec_certs.utils.label_studio_utils import to_label_studio_json"
    ]
   },
   {
@@ -42,7 +43,7 @@
     }
    ],
    "source": [
-    "dset = CCDataset.from_web_latest()\n",
+    "dset = CCDataset.from_web()\n",
     "df = dset.to_pandas()\n",
     "\n",
     "eval_digests = pd.read_csv(\"./../../data/cpe_eval/random.csv\", sep=\";\").set_index(\"dgst\").index\n",
@@ -58,7 +59,7 @@
     "with tempfile.TemporaryDirectory() as tmp_dir:\n",
     "    dset.root_dir = tmp_dir\n",
     "    dset.certs = {x.dgst: x for x in dset if x.dgst in eval_certs.index.tolist()}\n",
-    "    dset.to_label_studio_json(\"./label_studio_input_data.json\", update_json=False)"
+    "    to_label_studio_json(dset, \"./label_studio_input_data.json\")"
    ]
   },
   {

diff --git a/notebooks/cc/reference_annotations/train_validation_test_split.ipynb b/notebooks/cc/reference_annotations/train_validation_test_split.ipynb
@@ -29,7 +29,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dset = CCDataset.from_web_latest()\n",
+    "dset = CCDataset.from_web()\n",
     "df = dset.to_pandas()\n",
     "reference_rich_certs = {x.dgst for x in dset if (x.heuristics.st_references.directly_referencing and x.state.st_txt_path) or (x.heuristics.report_references.directly_referencing and x.state.report_txt_path)}\n",
     "df = df.loc[df.index.isin(reference_rich_certs)]\n",
@@ -57,7 +57,7 @@
     "    json.dump(x_valid.tolist(), handle, indent=4)\n",
     "\n",
     "with open(\"../../../data/reference_annotations_split/test.json\", \"w\") as handle:\n",
-    "    json.dump(x_test, handle, indent=4) "
+    "    json.dump(x_test, handle, indent=4)"
    ]
   }
  ],

diff --git a/notebooks/cc/scheme_eval.ipynb b/notebooks/cc/scheme_eval.ipynb
@@ -24,7 +24,8 @@
     "from sec_certs.model import CCSchemeMatcher\n",
     "from sec_certs.sample.cc_certificate_id import canonicalize\n",
     "from sec_certs.sample.cc_scheme import CCScheme, EntryType\n",
-    "from sec_certs.configuration import config"
+    "from sec_certs.configuration import config\n",
+    "from sec_certs.dataset.auxiliary_dataset_handling import CCSchemeDatasetHandler"
    ]
   },
   {
@@ -56,7 +57,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dset.auxiliary_datasets.scheme_dset = schemes\n",
+    "dset.aux_handlers[CCSchemeDatasetHandler].dset = schemes\n",
     "\n",
     "count_was = 0\n",
     "count_is = 0\n",
@@ -161,7 +162,7 @@
     "        rate = len(assigned)/len(total) * 100 if len(total) != 0 else 0\n",
     "        rate_list = rates.setdefault(country, [])\n",
     "        rate_list.append(rate)\n",
-    "        \n",
+    "\n",
     "        print(f\"{country}: {len(assigned)} assigned out of {len(total)} -> {rate:.1f}%\")\n",
     "        total_active = total[total[\"status\"] == \"active\"]\n",
     "        assigned_active = assigned[assigned[\"status\"] == \"active\"]\n",

diff --git a/notebooks/cc/temporal_trends.ipynb b/notebooks/cc/temporal_trends.ipynb
@@ -1,9 +1,11 @@
 {
  "cells": [
   {
-   "metadata": {},
    "cell_type": "markdown",
-   "source": "# Temporal trends in the CC ecosystem"
+   "metadata": {},
+   "source": [
+    "# Temporal trends in the CC ecosystem"
+   ]
   },
   {
    "cell_type": "code",
@@ -39,7 +41,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dset = CCDataset.from_web_latest()\n",
+    "dset = CCDataset.from_web()\n",
     "df = dset.to_pandas()"
    ]
   },

diff --git a/notebooks/cc/vulnerabilities.ipynb b/notebooks/cc/vulnerabilities.ipynb
@@ -33,6 +33,7 @@
     "import warnings\n",
     "from pathlib import Path\n",
     "import tempfile\n",
+    "from sec_certs.dataset.auxiliary_dataset_handling import CVEDatasetHandler, CPEDatasetHandler, CCMaintenanceUpdateDatasetHandler\n",
     "from sec_certs.dataset import CCDataset, CCDatasetMaintenanceUpdates, CVEDataset, CPEDataset\n",
     "from sec_certs.utils.pandas import (\n",
     "    compute_cve_correlations,\n",
@@ -81,16 +82,12 @@
     "cpe_dset: CPEDataset = CPEDataset.from_json(\"/path/to/cpe_dataset.json\")\n",
     "\n",
     "# # Remote instantiation (takes approx. 10 minutes to complete)\n",
-    "# dset: CCDataset = CCDataset.from_web_latest(path=\"dset\", auxiliary_datasets=True)\n",
+    "# dset: CCDataset = CCDataset.from_web(path=\"dset\", auxiliary_datasets=True)\n",
+    "# dset.load_auxiliary_datasets()\n",
     "\n",
-    "# print(\"Downloading dataset of maintenance updates\")\n",
-    "# main_dset: CCDatasetMaintenanceUpdates = CCDatasetMaintenanceUpdates.from_web_latest()\n",
-    "\n",
-    "# print(\"Downloading CPE dataset\")\n",
-    "# cpe_dset: CPEDataset = dset.auxiliary_datasets.cpe_dset\n",
-    "\n",
-    "# print(\"Downloading CVE dataset\")\n",
-    "# cve_dset: CVEDataset = dset.auxiliary_datasets.cve_dset"
+    "# main_dset: CCDatasetMaintenanceUpdates = dset.aux_handlers[CCMaintenanceUpdateDatasetHandler].dset\n",
+    "# cpe_dset: CPEDataset = dset.aux_handlers[CPEDatasetHandler].dset\n",
+    "# cve_dset: CVEDataset = dset.aux_handlers[CVEDatasetHandler].dset"
    ]
   },
   {

diff --git a/notebooks/examples/cc.ipynb b/notebooks/examples/cc.ipynb
@@ -44,7 +44,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dset = CCDataset.from_web_latest()\n",
+    "dset = CCDataset.from_web()\n",
     "print(len(dset)) # Print number of certificates in the dataset"
    ]
   },
@@ -188,7 +188,7 @@
    "source": [
     "## Assign dataset with CPE records and compute vulnerabilities\n",
     "\n",
-    "*Note*: The data is already computed on dataset obtained with `from_web_latest()`, this is just for illustration. \n",
+    "*Note*: The data is already computed on dataset obtained with `from_web()`, this is just for illustration. \n",
     "*Note*: This may likely not run in Binder, as the corresponding `CVEDataset` and `CPEDataset` instances take a lot of memory."
    ]
   },
@@ -212,7 +212,7 @@
     "The following piece of code roughly corresponds to `$ sec-certs cc all` CLI command -- it fully processes the CC pipeline. This will create a folder in current working directory where the outputs will be stored. \n",
     "\n",
     "```{warning}\n",
-    "It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web_latest()` or turning this into a Python script.\n",
+    "It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web()` or turning this into a Python script.\n",
     "```"
    ]
   },
@@ -231,8 +231,8 @@
    ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## Advanced usage\n",
     "There are more notebooks available showcasing more advanced usage of the tool.\n",

diff --git a/notebooks/examples/est_solution.ipynb b/notebooks/examples/est_solution.ipynb
@@ -57,7 +57,7 @@
    ],
    "source": [
     "# Download the dataset and see how many certificates it contains\n",
-    "dataset = CCDataset.from_web_latest()\n",
+    "dataset = CCDataset.from_web()\n",
     "print(f\"The downloaded CCDataset contains {len(dataset)} certificates\")"
    ]
   },
@@ -80,7 +80,9 @@
    "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## 2. Turn the dataset into a [pandas](https://pandas.pydata.org/) dataframe -- a data structure suitable for further data analysis."
+   "source": [
+    "## 2. Turn the dataset into a [pandas](https://pandas.pydata.org/) dataframe -- a data structure suitable for further data analysis."
+   ]
   },
   {
    "cell_type": "code",
@@ -495,7 +497,7 @@
     }
    ],
    "source": [
-    "# Show arbitrary subset that we've defined earlier \n",
+    "# Show arbitrary subset that we've defined earlier\n",
     "eal6_or_more.head()"
    ]
   },

diff --git a/notebooks/examples/fips.ipynb b/notebooks/examples/fips.ipynb
@@ -38,7 +38,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "dset: FIPSDataset = FIPSDataset.from_web_latest()\n",
+    "dset: FIPSDataset = FIPSDataset.from_web()\n",
     "print(len(dset))"
    ]
   },
@@ -87,7 +87,9 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## Dissect a single certificate"
+   "source": [
+    "## Dissect a single certificate"
+   ]
   },
   {
    "cell_type": "code",
@@ -128,7 +130,7 @@
     "## Create new dataset and fully process it\n",
     "\n",
     "```{warning}\n",
-    "It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web_latest()` or turning this into a Python script.\n",
+    "It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web()` or turning this into a Python script.\n",
     "```"
    ]
   },
@@ -147,8 +149,8 @@
    ]
   },
   {
-   "metadata": {},
    "cell_type": "markdown",
+   "metadata": {},
    "source": [
     "## Advanced usage\n",
     "There are more notebooks available showcasing more advanced usage of the tool.\n",
-Original file line number
+Diff line change
@@ Expand Up / @@ -34,7 +34,7 @@ pip install -e . @@
     python -m spacy download en_core_web_sm
     ```
-    Alternatively, our Our [Dockerfile](https://github.com/crocs-muni/sec-certs/blob/main/Dockerfile) represents a reproducible way of setting up the environment.
+    Alternatively, our [Dockerfile](https://github.com/crocs-muni/sec-certs/blob/main/Dockerfile) represents a reproducible way of setting up the environment.
     :::
     ::::
@@ Expand Down @@