Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PP processing (rebased) #469

Merged
merged 27 commits into from
Feb 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
24187ac
update deps
adamjanovsky Jan 16, 2025
3c34bbe
refactor auxiliary dataset handling, heuristics computation
adamjanovsky Jan 22, 2025
a4ba427
fix notebooks
adamjanovsky Jan 22, 2025
2500d95
implement PP processing
adamjanovsky Jan 25, 2025
4dba5a8
don't delete CCSchemeDatasetHandler when skipping schemes processing
adamjanovsky Jan 27, 2025
2e9108c
get rid of duplicate CC URL constant
adamjanovsky Jan 27, 2025
45a4ca6
CCDataset, serialize PP links to dataframe
adamjanovsky Jan 27, 2025
d0619fa
bump deps, foster auto updates
adamjanovsky Jan 27, 2025
bcbb31e
forbid empty PP links in ProtectionProfile objects
adamjanovsky Jan 27, 2025
92f7e3e
process PP maintenances
adamjanovsky Jan 27, 2025
3ec7859
change PP snapshot urls
adamjanovsky Jan 27, 2025
47b7a24
add aux dataset tests
adamjanovsky Jan 28, 2025
9fc3c5a
add dgst testing for CC sample
adamjanovsky Jan 28, 2025
a4fe488
implement PP tests
adamjanovsky Jan 28, 2025
cbbf717
fix bad type in cc old dgst test
adamjanovsky Jan 28, 2025
69be843
replace from_web_latest() with from_web()
adamjanovsky Jan 29, 2025
ebbdadc
CLI: Add support for PP processing
adamjanovsky Jan 29, 2025
f712d2f
docs: Add protection profiles
adamjanovsky Jan 29, 2025
741f403
fix hanging tests
adamjanovsky Jan 29, 2025
d7ccc16
fix iut mip tests
adamjanovsky Jan 29, 2025
3173ff9
Revert the FIPS IUT and MIP from_web methods.
J08nY Feb 1, 2025
8031764
Fix IUT and MIP tests.
J08nY Feb 1, 2025
c3151fe
Remove processed_pp_dataset_root_dir, let PP dataset handler figure o…
J08nY Feb 2, 2025
8a0cc69
Fix aux handlers super() init call.
J08nY Feb 2, 2025
b128c47
Fix overridden method args.
J08nY Feb 2, 2025
63f993b
Add docs about dataset layout.
J08nY Feb 2, 2025
996759d
Update PP dataset URL.
J08nY Feb 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Most probably, you don't want to fully process the certification artifacts by yo
```python
from sec_certs.dataset import CCDataset

dset = CCDataset.from_web_latest() # now you can inspect the object, certificates are held in dset.certs
dset = CCDataset.from_web() # now you can inspect the object, certificates are held in dset.certs
df = dset.to_pandas() # Or you can transform the object into Pandas dataframe
dset.to_json(
'./latest_cc_snapshot.json') # You may want to store the snapshot as json, so that you don't have to download it again
Expand Down
18 changes: 15 additions & 3 deletions docs/api/dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,26 +5,38 @@
:no-members:
```

This documentation doesn't provide full API reference for all members of `dataset` package. Instead, it concentrates on the Dataset that are immediately exposed to the users. Namely, we focus on `CCDataset`, `FIPSDataset` and their abstract base class `Dataset`.
This documentation doesn't provide full API reference for all members of `dataset` package. Instead, it concentrates on the Dataset that are immediately exposed to the users.
Namely, we focus on `CCDataset`, `FIPSDataset`, `ProtectionProfileDataset` and their abstract base class `Dataset`.

```{tip}
The examples related to this package can be found in the [common criteria notebook](./../notebooks/examples/cc.ipynb) and the [fips notebook](./../notebooks/examples/fips.ipynb).
The examples related to this package can be found in the [common criteria notebook](./../notebooks/examples/cc.ipynb),
the [protection profile notebook](./../notebooks/examples/protection_profiles.ipynb), and the [fips notebook](./../notebooks/examples/fips.ipynb).
```

## CCDataset
## Base Dataset

```{eval-rst}
.. currentmodule:: sec_certs.dataset.dataset
.. autoclass:: Dataset
:members:
```

## CCDataset

```{eval-rst}
.. currentmodule:: sec_certs.dataset
.. autoclass:: CCDataset
:members:
```

## ProtectionProfileDataset

```{eval-rst}
.. currentmodule:: sec_certs.dataset
.. autoclass:: ProtectionProfileDataset
:members:
```

## FIPSDataset

```{eval-rst}
Expand Down
11 changes: 10 additions & 1 deletion docs/api/sample.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,19 @@ The examples related to this package can be found in the [common criteria notebo
:members:
```

## ProtectionProfile

```{eval-rst}
.. currentmodule:: sec_certs.sample
.. autoclass:: ProtectionProfile
:members:
```

## FIPSCertificate

```{eval-rst}
.. currentmodule:: sec_certs.sample
.. autoclass:: FIPSCertificate
:members:
```
```

1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ search_examples.md
:maxdepth: 1
Demo <notebooks/examples/est_solution.ipynb>
Common Criteria <notebooks/examples/cc.ipynb>
Protection Profiles <notebooks/examples/protection_profiles.ipynb>
FIPS-140 <notebooks/examples/fips.ipynb>
FIPS-140 IUT <notebooks/examples/fips_iut.ipynb>
FIPS-140 MIP <notebooks/examples/fips_mip.ipynb>
Expand Down
2 changes: 1 addition & 1 deletion docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ pip install -e .
python -m spacy download en_core_web_sm
```

Alternatively, our Our [Dockerfile](https://github.com/crocs-muni/sec-certs/blob/main/Dockerfile) represents a reproducible way of setting up the environment.
Alternatively, our [Dockerfile](https://github.com/crocs-muni/sec-certs/blob/main/Dockerfile) represents a reproducible way of setting up the environment.

:::
::::
Expand Down
8 changes: 4 additions & 4 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
```python
from sec_certs.dataset.cc import CCDataset

dset = CCDataset.from_web_latest()
dset = CCDataset.from_web()
```
to obtain to obtain freshly processed dataset from [sec-certs.org](https://sec-certs.org).
to obtain the freshly processed dataset from [sec-certs.org](https://sec-certs.org).

3. Play with the dataset. See [example notebook](./notebooks/examples/cc.ipynb).
:::
Expand All @@ -21,9 +21,9 @@ to obtain to obtain freshly processed dataset from [sec-certs.org](https://sec-c
```python
from sec_certs.dataset.fips import FIPSDataset

dset = FIPSDataset.from_web_latest()
dset = FIPSDataset.from_web()
```
to obtain to obtain freshly processed dataset from [sec-certs.org](https://sec-certs.org).
to obtain the freshly processed dataset from [sec-certs.org](https://sec-certs.org).

3. Play with the dataset. See [example notebook](./notebooks/examples/fips.ipynb).
:::
Expand Down
6 changes: 3 additions & 3 deletions docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,16 @@ Our tool matches certificates to their possible CVEs using datasets downloaded f
Our tool can seamlessly download the required NVD datasets when needed. We support two download mechanisms:

1. Fetching datasets with the [NVD API](https://nvd.nist.gov/developers/start-here) (preferred way).
1. Fetching snapshots from seccerts.org.
1. Fetching snapshots from sec-certs.org.

The following two keys control the behaviour:

```yaml
preferred_source_nvd_datasets: "api" # set to "sec-certs" to fetch them from sec-certs.org
preferred_source_remote_datasets: "origin" # set to "sec-certs" to fetch them from sec-certs.org
nvd_api_key: null # or the actual key value
```

If you aim to fetch the sources from NVD, we advise you to get an [NVD API key](https://nvd.nist.gov/developers/request-an-api-key) and set the `nvd_api_key` setting accordingly. The download from NVD will work even without API key, it will just be slow. No API key is needed when `preferred_source_nvd_datasets: "sec-certs"`
If you aim to fetch the sources from NVD, we advise you to get an [NVD API key](https://nvd.nist.gov/developers/request-an-api-key) and set the `nvd_api_key` setting accordingly. The download from NVD will work even without API key, it will just be slow. No API key is needed when `preferred_source_remote_datasets: "sec-certs"`


## Inferring inter-certificate reference context
Expand Down
2 changes: 1 addition & 1 deletion notebooks/cc/cert_id_eval.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
},
"outputs": [],
"source": [
"dset = CCDataset.from_web_latest()\n"
"dset = CCDataset.from_web()\n"
]
},
{
Expand Down
8 changes: 5 additions & 3 deletions notebooks/cc/chain_of_trust_plots.ipynb
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Plots from the \"Chain of Trust\" paper"
"metadata": {},
"source": [
"# Plots from the \"Chain of Trust\" paper"
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -79,7 +81,7 @@
"figure_width = 3.5\n",
"figure_height = 2.5\n",
"\n",
"dset = CCDataset.from_web_latest()\n",
"dset = CCDataset.from_web()\n",
"df = dset.to_pandas()"
]
},
Expand Down
7 changes: 4 additions & 3 deletions notebooks/cc/cpe_eval.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@
"from sec_certs.dataset import CCDataset\n",
"import pandas as pd\n",
"import json\n",
"import tempfile"
"import tempfile\n",
"from sec_certs.utils.label_studio_utils import to_label_studio_json"
]
},
{
Expand All @@ -42,7 +43,7 @@
}
],
"source": [
"dset = CCDataset.from_web_latest()\n",
"dset = CCDataset.from_web()\n",
"df = dset.to_pandas()\n",
"\n",
"eval_digests = pd.read_csv(\"./../../data/cpe_eval/random.csv\", sep=\";\").set_index(\"dgst\").index\n",
Expand All @@ -58,7 +59,7 @@
"with tempfile.TemporaryDirectory() as tmp_dir:\n",
" dset.root_dir = tmp_dir\n",
" dset.certs = {x.dgst: x for x in dset if x.dgst in eval_certs.index.tolist()}\n",
" dset.to_label_studio_json(\"./label_studio_input_data.json\", update_json=False)"
" to_label_studio_json(dset, \"./label_studio_input_data.json\")"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"metadata": {},
"outputs": [],
"source": [
"dset = CCDataset.from_web_latest()\n",
"dset = CCDataset.from_web()\n",
"df = dset.to_pandas()\n",
"reference_rich_certs = {x.dgst for x in dset if (x.heuristics.st_references.directly_referencing and x.state.st_txt_path) or (x.heuristics.report_references.directly_referencing and x.state.report_txt_path)}\n",
"df = df.loc[df.index.isin(reference_rich_certs)]\n",
Expand Down Expand Up @@ -57,7 +57,7 @@
" json.dump(x_valid.tolist(), handle, indent=4)\n",
"\n",
"with open(\"../../../data/reference_annotations_split/test.json\", \"w\") as handle:\n",
" json.dump(x_test, handle, indent=4) "
" json.dump(x_test, handle, indent=4)"
]
}
],
Expand Down
7 changes: 4 additions & 3 deletions notebooks/cc/scheme_eval.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
"from sec_certs.model import CCSchemeMatcher\n",
"from sec_certs.sample.cc_certificate_id import canonicalize\n",
"from sec_certs.sample.cc_scheme import CCScheme, EntryType\n",
"from sec_certs.configuration import config"
"from sec_certs.configuration import config\n",
"from sec_certs.dataset.auxiliary_dataset_handling import CCSchemeDatasetHandler"
]
},
{
Expand Down Expand Up @@ -56,7 +57,7 @@
"metadata": {},
"outputs": [],
"source": [
"dset.auxiliary_datasets.scheme_dset = schemes\n",
"dset.aux_handlers[CCSchemeDatasetHandler].dset = schemes\n",
"\n",
"count_was = 0\n",
"count_is = 0\n",
Expand Down Expand Up @@ -161,7 +162,7 @@
" rate = len(assigned)/len(total) * 100 if len(total) != 0 else 0\n",
" rate_list = rates.setdefault(country, [])\n",
" rate_list.append(rate)\n",
" \n",
"\n",
" print(f\"{country}: {len(assigned)} assigned out of {len(total)} -> {rate:.1f}%\")\n",
" total_active = total[total[\"status\"] == \"active\"]\n",
" assigned_active = assigned[assigned[\"status\"] == \"active\"]\n",
Expand Down
8 changes: 5 additions & 3 deletions notebooks/cc/temporal_trends.ipynb
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Temporal trends in the CC ecosystem"
"metadata": {},
"source": [
"# Temporal trends in the CC ecosystem"
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -39,7 +41,7 @@
"metadata": {},
"outputs": [],
"source": [
"dset = CCDataset.from_web_latest()\n",
"dset = CCDataset.from_web()\n",
"df = dset.to_pandas()"
]
},
Expand Down
15 changes: 6 additions & 9 deletions notebooks/cc/vulnerabilities.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
"import warnings\n",
"from pathlib import Path\n",
"import tempfile\n",
"from sec_certs.dataset.auxiliary_dataset_handling import CVEDatasetHandler, CPEDatasetHandler, CCMaintenanceUpdateDatasetHandler\n",
"from sec_certs.dataset import CCDataset, CCDatasetMaintenanceUpdates, CVEDataset, CPEDataset\n",
"from sec_certs.utils.pandas import (\n",
" compute_cve_correlations,\n",
Expand Down Expand Up @@ -81,16 +82,12 @@
"cpe_dset: CPEDataset = CPEDataset.from_json(\"/path/to/cpe_dataset.json\")\n",
"\n",
"# # Remote instantiation (takes approx. 10 minutes to complete)\n",
"# dset: CCDataset = CCDataset.from_web_latest(path=\"dset\", auxiliary_datasets=True)\n",
"# dset: CCDataset = CCDataset.from_web(path=\"dset\", auxiliary_datasets=True)\n",
"# dset.load_auxiliary_datasets()\n",
"\n",
"# print(\"Downloading dataset of maintenance updates\")\n",
"# main_dset: CCDatasetMaintenanceUpdates = CCDatasetMaintenanceUpdates.from_web_latest()\n",
"\n",
"# print(\"Downloading CPE dataset\")\n",
"# cpe_dset: CPEDataset = dset.auxiliary_datasets.cpe_dset\n",
"\n",
"# print(\"Downloading CVE dataset\")\n",
"# cve_dset: CVEDataset = dset.auxiliary_datasets.cve_dset"
"# main_dset: CCDatasetMaintenanceUpdates = dset.aux_handlers[CCMaintenanceUpdateDatasetHandler].dset\n",
"# cpe_dset: CPEDataset = dset.aux_handlers[CPEDatasetHandler].dset\n",
"# cve_dset: CVEDataset = dset.aux_handlers[CVEDatasetHandler].dset"
]
},
{
Expand Down
8 changes: 4 additions & 4 deletions notebooks/examples/cc.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
"metadata": {},
"outputs": [],
"source": [
"dset = CCDataset.from_web_latest()\n",
"dset = CCDataset.from_web()\n",
"print(len(dset)) # Print number of certificates in the dataset"
]
},
Expand Down Expand Up @@ -188,7 +188,7 @@
"source": [
"## Assign dataset with CPE records and compute vulnerabilities\n",
"\n",
"*Note*: The data is already computed on dataset obtained with `from_web_latest()`, this is just for illustration. \n",
"*Note*: The data is already computed on dataset obtained with `from_web()`, this is just for illustration. \n",
"*Note*: This may likely not run in Binder, as the corresponding `CVEDataset` and `CPEDataset` instances take a lot of memory."
]
},
Expand All @@ -212,7 +212,7 @@
"The following piece of code roughly corresponds to `$ sec-certs cc all` CLI command -- it fully processes the CC pipeline. This will create a folder in current working directory where the outputs will be stored. \n",
"\n",
"```{warning}\n",
"It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web_latest()` or turning this into a Python script.\n",
"It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web()` or turning this into a Python script.\n",
"```"
]
},
Expand All @@ -231,8 +231,8 @@
]
},
{
"metadata": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced usage\n",
"There are more notebooks available showcasing more advanced usage of the tool.\n",
Expand Down
8 changes: 5 additions & 3 deletions notebooks/examples/est_solution.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@
],
"source": [
"# Download the dataset and see how many certificates it contains\n",
"dataset = CCDataset.from_web_latest()\n",
"dataset = CCDataset.from_web()\n",
"print(f\"The downloaded CCDataset contains {len(dataset)} certificates\")"
]
},
Expand All @@ -80,7 +80,9 @@
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": "## 2. Turn the dataset into a [pandas](https://pandas.pydata.org/) dataframe -- a data structure suitable for further data analysis."
"source": [
"## 2. Turn the dataset into a [pandas](https://pandas.pydata.org/) dataframe -- a data structure suitable for further data analysis."
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -495,7 +497,7 @@
}
],
"source": [
"# Show arbitrary subset that we've defined earlier \n",
"# Show arbitrary subset that we've defined earlier\n",
"eal6_or_more.head()"
]
},
Expand Down
10 changes: 6 additions & 4 deletions notebooks/examples/fips.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"metadata": {},
"outputs": [],
"source": [
"dset: FIPSDataset = FIPSDataset.from_web_latest()\n",
"dset: FIPSDataset = FIPSDataset.from_web()\n",
"print(len(dset))"
]
},
Expand Down Expand Up @@ -87,7 +87,9 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "## Dissect a single certificate"
"source": [
"## Dissect a single certificate"
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -128,7 +130,7 @@
"## Create new dataset and fully process it\n",
"\n",
"```{warning}\n",
"It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web_latest()` or turning this into a Python script.\n",
"It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web()` or turning this into a Python script.\n",
"```"
]
},
Expand All @@ -147,8 +149,8 @@
]
},
{
"metadata": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced usage\n",
"There are more notebooks available showcasing more advanced usage of the tool.\n",
Expand Down
Loading