Skip to content

Commit 1329b41

Browse files
authored
Merge pull request #469 from crocs-muni/feat/pp-processing
Add PP processing (rebased)
2 parents d640ad1 + 996759d commit 1329b41

File tree

87 files changed

+6595
-2369
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

87 files changed

+6595
-2369
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Most probably, you don't want to fully process the certification artifacts by yo
4848
```python
4949
from sec_certs.dataset import CCDataset
5050

51-
dset = CCDataset.from_web_latest() # now you can inspect the object, certificates are held in dset.certs
51+
dset = CCDataset.from_web() # now you can inspect the object, certificates are held in dset.certs
5252
df = dset.to_pandas() # Or you can transform the object into Pandas dataframe
5353
dset.to_json(
5454
'./latest_cc_snapshot.json') # You may want to store the snapshot as json, so that you don't have to download it again

docs/api/dataset.md

+15-3
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,38 @@
55
:no-members:
66
```
77

8-
This documentation doesn't provide full API reference for all members of `dataset` package. Instead, it concentrates on the Dataset that are immediately exposed to the users. Namely, we focus on `CCDataset`, `FIPSDataset` and their abstract base class `Dataset`.
8+
This documentation doesn't provide full API reference for all members of `dataset` package. Instead, it concentrates on the Dataset that are immediately exposed to the users.
9+
Namely, we focus on `CCDataset`, `FIPSDataset`, `ProtectionProfileDataset` and their abstract base class `Dataset`.
910

1011
```{tip}
11-
The examples related to this package can be found in the [common criteria notebook](./../notebooks/examples/cc.ipynb) and the [fips notebook](./../notebooks/examples/fips.ipynb).
12+
The examples related to this package can be found in the [common criteria notebook](./../notebooks/examples/cc.ipynb),
13+
the [protection profile notebook](./../notebooks/examples/protection_profiles.ipynb), and the [fips notebook](./../notebooks/examples/fips.ipynb).
1214
```
1315

14-
## CCDataset
16+
## Base Dataset
1517

1618
```{eval-rst}
1719
.. currentmodule:: sec_certs.dataset.dataset
1820
.. autoclass:: Dataset
1921
:members:
2022
```
2123

24+
## CCDataset
25+
2226
```{eval-rst}
2327
.. currentmodule:: sec_certs.dataset
2428
.. autoclass:: CCDataset
2529
:members:
2630
```
2731

32+
## ProtectionProfileDataset
33+
34+
```{eval-rst}
35+
.. currentmodule:: sec_certs.dataset
36+
.. autoclass:: ProtectionProfileDataset
37+
:members:
38+
```
39+
2840
## FIPSDataset
2941

3042
```{eval-rst}

docs/api/sample.md

+10-1
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,19 @@ The examples related to this package can be found in the [common criteria notebo
1717
:members:
1818
```
1919

20+
## ProtectionProfile
21+
22+
```{eval-rst}
23+
.. currentmodule:: sec_certs.sample
24+
.. autoclass:: ProtectionProfile
25+
:members:
26+
```
27+
2028
## FIPSCertificate
2129

2230
```{eval-rst}
2331
.. currentmodule:: sec_certs.sample
2432
.. autoclass:: FIPSCertificate
2533
:members:
26-
```
34+
```
35+

docs/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ search_examples.md
4141
:maxdepth: 1
4242
Demo <notebooks/examples/est_solution.ipynb>
4343
Common Criteria <notebooks/examples/cc.ipynb>
44+
Protection Profiles <notebooks/examples/protection_profiles.ipynb>
4445
FIPS-140 <notebooks/examples/fips.ipynb>
4546
FIPS-140 IUT <notebooks/examples/fips_iut.ipynb>
4647
FIPS-140 MIP <notebooks/examples/fips_mip.ipynb>

docs/installation.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ pip install -e .
3434
python -m spacy download en_core_web_sm
3535
```
3636

37-
Alternatively, our Our [Dockerfile](https://github.com/crocs-muni/sec-certs/blob/main/Dockerfile) represents a reproducible way of setting up the environment.
37+
Alternatively, our [Dockerfile](https://github.com/crocs-muni/sec-certs/blob/main/Dockerfile) represents a reproducible way of setting up the environment.
3838

3939
:::
4040
::::

docs/quickstart.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
```python
99
from sec_certs.dataset.cc import CCDataset
1010

11-
dset = CCDataset.from_web_latest()
11+
dset = CCDataset.from_web()
1212
```
13-
to obtain to obtain freshly processed dataset from [sec-certs.org](https://sec-certs.org).
13+
to obtain the freshly processed dataset from [sec-certs.org](https://sec-certs.org).
1414

1515
3. Play with the dataset. See [example notebook](./notebooks/examples/cc.ipynb).
1616
:::
@@ -21,9 +21,9 @@ to obtain to obtain freshly processed dataset from [sec-certs.org](https://sec-c
2121
```python
2222
from sec_certs.dataset.fips import FIPSDataset
2323

24-
dset = FIPSDataset.from_web_latest()
24+
dset = FIPSDataset.from_web()
2525
```
26-
to obtain to obtain freshly processed dataset from [sec-certs.org](https://sec-certs.org).
26+
to obtain the freshly processed dataset from [sec-certs.org](https://sec-certs.org).
2727

2828
3. Play with the dataset. See [example notebook](./notebooks/examples/fips.ipynb).
2929
:::

docs/user_guide.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,16 @@ Our tool matches certificates to their possible CVEs using datasets downloaded f
1111
Our tool can seamlessly download the required NVD datasets when needed. We support two download mechanisms:
1212

1313
1. Fetching datasets with the [NVD API](https://nvd.nist.gov/developers/start-here) (preferred way).
14-
1. Fetching snapshots from seccerts.org.
14+
1. Fetching snapshots from sec-certs.org.
1515

1616
The following two keys control the behaviour:
1717

1818
```yaml
19-
preferred_source_nvd_datasets: "api" # set to "sec-certs" to fetch them from sec-certs.org
19+
preferred_source_remote_datasets: "origin" # set to "sec-certs" to fetch them from sec-certs.org
2020
nvd_api_key: null # or the actual key value
2121
```
2222
23-
If you aim to fetch the sources from NVD, we advise you to get an [NVD API key](https://nvd.nist.gov/developers/request-an-api-key) and set the `nvd_api_key` setting accordingly. The download from NVD will work even without API key, it will just be slow. No API key is needed when `preferred_source_nvd_datasets: "sec-certs"`
23+
If you aim to fetch the sources from NVD, we advise you to get an [NVD API key](https://nvd.nist.gov/developers/request-an-api-key) and set the `nvd_api_key` setting accordingly. The download from NVD will work even without API key, it will just be slow. No API key is needed when `preferred_source_remote_datasets: "sec-certs"`
2424

2525

2626
## Inferring inter-certificate reference context

notebooks/cc/cert_id_eval.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
},
6565
"outputs": [],
6666
"source": [
67-
"dset = CCDataset.from_web_latest()\n"
67+
"dset = CCDataset.from_web()\n"
6868
]
6969
},
7070
{

notebooks/cc/chain_of_trust_plots.ipynb

+5-3
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
{
22
"cells": [
33
{
4-
"metadata": {},
54
"cell_type": "markdown",
6-
"source": "# Plots from the \"Chain of Trust\" paper"
5+
"metadata": {},
6+
"source": [
7+
"# Plots from the \"Chain of Trust\" paper"
8+
]
79
},
810
{
911
"cell_type": "code",
@@ -79,7 +81,7 @@
7981
"figure_width = 3.5\n",
8082
"figure_height = 2.5\n",
8183
"\n",
82-
"dset = CCDataset.from_web_latest()\n",
84+
"dset = CCDataset.from_web()\n",
8385
"df = dset.to_pandas()"
8486
]
8587
},

notebooks/cc/cpe_eval.ipynb

+4-3
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@
1818
"from sec_certs.dataset import CCDataset\n",
1919
"import pandas as pd\n",
2020
"import json\n",
21-
"import tempfile"
21+
"import tempfile\n",
22+
"from sec_certs.utils.label_studio_utils import to_label_studio_json"
2223
]
2324
},
2425
{
@@ -42,7 +43,7 @@
4243
}
4344
],
4445
"source": [
45-
"dset = CCDataset.from_web_latest()\n",
46+
"dset = CCDataset.from_web()\n",
4647
"df = dset.to_pandas()\n",
4748
"\n",
4849
"eval_digests = pd.read_csv(\"./../../data/cpe_eval/random.csv\", sep=\";\").set_index(\"dgst\").index\n",
@@ -58,7 +59,7 @@
5859
"with tempfile.TemporaryDirectory() as tmp_dir:\n",
5960
" dset.root_dir = tmp_dir\n",
6061
" dset.certs = {x.dgst: x for x in dset if x.dgst in eval_certs.index.tolist()}\n",
61-
" dset.to_label_studio_json(\"./label_studio_input_data.json\", update_json=False)"
62+
" to_label_studio_json(dset, \"./label_studio_input_data.json\")"
6263
]
6364
},
6465
{

notebooks/cc/reference_annotations/train_validation_test_split.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
"metadata": {},
3030
"outputs": [],
3131
"source": [
32-
"dset = CCDataset.from_web_latest()\n",
32+
"dset = CCDataset.from_web()\n",
3333
"df = dset.to_pandas()\n",
3434
"reference_rich_certs = {x.dgst for x in dset if (x.heuristics.st_references.directly_referencing and x.state.st_txt_path) or (x.heuristics.report_references.directly_referencing and x.state.report_txt_path)}\n",
3535
"df = df.loc[df.index.isin(reference_rich_certs)]\n",
@@ -57,7 +57,7 @@
5757
" json.dump(x_valid.tolist(), handle, indent=4)\n",
5858
"\n",
5959
"with open(\"../../../data/reference_annotations_split/test.json\", \"w\") as handle:\n",
60-
" json.dump(x_test, handle, indent=4) "
60+
" json.dump(x_test, handle, indent=4)"
6161
]
6262
}
6363
],

notebooks/cc/scheme_eval.ipynb

+4-3
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@
2424
"from sec_certs.model import CCSchemeMatcher\n",
2525
"from sec_certs.sample.cc_certificate_id import canonicalize\n",
2626
"from sec_certs.sample.cc_scheme import CCScheme, EntryType\n",
27-
"from sec_certs.configuration import config"
27+
"from sec_certs.configuration import config\n",
28+
"from sec_certs.dataset.auxiliary_dataset_handling import CCSchemeDatasetHandler"
2829
]
2930
},
3031
{
@@ -56,7 +57,7 @@
5657
"metadata": {},
5758
"outputs": [],
5859
"source": [
59-
"dset.auxiliary_datasets.scheme_dset = schemes\n",
60+
"dset.aux_handlers[CCSchemeDatasetHandler].dset = schemes\n",
6061
"\n",
6162
"count_was = 0\n",
6263
"count_is = 0\n",
@@ -161,7 +162,7 @@
161162
" rate = len(assigned)/len(total) * 100 if len(total) != 0 else 0\n",
162163
" rate_list = rates.setdefault(country, [])\n",
163164
" rate_list.append(rate)\n",
164-
" \n",
165+
"\n",
165166
" print(f\"{country}: {len(assigned)} assigned out of {len(total)} -> {rate:.1f}%\")\n",
166167
" total_active = total[total[\"status\"] == \"active\"]\n",
167168
" assigned_active = assigned[assigned[\"status\"] == \"active\"]\n",

notebooks/cc/temporal_trends.ipynb

+5-3
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
{
22
"cells": [
33
{
4-
"metadata": {},
54
"cell_type": "markdown",
6-
"source": "# Temporal trends in the CC ecosystem"
5+
"metadata": {},
6+
"source": [
7+
"# Temporal trends in the CC ecosystem"
8+
]
79
},
810
{
911
"cell_type": "code",
@@ -39,7 +41,7 @@
3941
"metadata": {},
4042
"outputs": [],
4143
"source": [
42-
"dset = CCDataset.from_web_latest()\n",
44+
"dset = CCDataset.from_web()\n",
4345
"df = dset.to_pandas()"
4446
]
4547
},

notebooks/cc/vulnerabilities.ipynb

+6-9
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
"import warnings\n",
3434
"from pathlib import Path\n",
3535
"import tempfile\n",
36+
"from sec_certs.dataset.auxiliary_dataset_handling import CVEDatasetHandler, CPEDatasetHandler, CCMaintenanceUpdateDatasetHandler\n",
3637
"from sec_certs.dataset import CCDataset, CCDatasetMaintenanceUpdates, CVEDataset, CPEDataset\n",
3738
"from sec_certs.utils.pandas import (\n",
3839
" compute_cve_correlations,\n",
@@ -81,16 +82,12 @@
8182
"cpe_dset: CPEDataset = CPEDataset.from_json(\"/path/to/cpe_dataset.json\")\n",
8283
"\n",
8384
"# # Remote instantiation (takes approx. 10 minutes to complete)\n",
84-
"# dset: CCDataset = CCDataset.from_web_latest(path=\"dset\", auxiliary_datasets=True)\n",
85+
"# dset: CCDataset = CCDataset.from_web(path=\"dset\", auxiliary_datasets=True)\n",
86+
"# dset.load_auxiliary_datasets()\n",
8587
"\n",
86-
"# print(\"Downloading dataset of maintenance updates\")\n",
87-
"# main_dset: CCDatasetMaintenanceUpdates = CCDatasetMaintenanceUpdates.from_web_latest()\n",
88-
"\n",
89-
"# print(\"Downloading CPE dataset\")\n",
90-
"# cpe_dset: CPEDataset = dset.auxiliary_datasets.cpe_dset\n",
91-
"\n",
92-
"# print(\"Downloading CVE dataset\")\n",
93-
"# cve_dset: CVEDataset = dset.auxiliary_datasets.cve_dset"
88+
"# main_dset: CCDatasetMaintenanceUpdates = dset.aux_handlers[CCMaintenanceUpdateDatasetHandler].dset\n",
89+
"# cpe_dset: CPEDataset = dset.aux_handlers[CPEDatasetHandler].dset\n",
90+
"# cve_dset: CVEDataset = dset.aux_handlers[CVEDatasetHandler].dset"
9491
]
9592
},
9693
{

notebooks/examples/cc.ipynb

+4-4
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
"metadata": {},
4545
"outputs": [],
4646
"source": [
47-
"dset = CCDataset.from_web_latest()\n",
47+
"dset = CCDataset.from_web()\n",
4848
"print(len(dset)) # Print number of certificates in the dataset"
4949
]
5050
},
@@ -188,7 +188,7 @@
188188
"source": [
189189
"## Assign dataset with CPE records and compute vulnerabilities\n",
190190
"\n",
191-
"*Note*: The data is already computed on dataset obtained with `from_web_latest()`, this is just for illustration. \n",
191+
"*Note*: The data is already computed on dataset obtained with `from_web()`, this is just for illustration. \n",
192192
"*Note*: This may likely not run in Binder, as the corresponding `CVEDataset` and `CPEDataset` instances take a lot of memory."
193193
]
194194
},
@@ -212,7 +212,7 @@
212212
"The following piece of code roughly corresponds to `$ sec-certs cc all` CLI command -- it fully processes the CC pipeline. This will create a folder in current working directory where the outputs will be stored. \n",
213213
"\n",
214214
"```{warning}\n",
215-
"It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web_latest()` or turning this into a Python script.\n",
215+
"It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web()` or turning this into a Python script.\n",
216216
"```"
217217
]
218218
},
@@ -231,8 +231,8 @@
231231
]
232232
},
233233
{
234-
"metadata": {},
235234
"cell_type": "markdown",
235+
"metadata": {},
236236
"source": [
237237
"## Advanced usage\n",
238238
"There are more notebooks available showcasing more advanced usage of the tool.\n",

notebooks/examples/est_solution.ipynb

+5-3
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@
5757
],
5858
"source": [
5959
"# Download the dataset and see how many certificates it contains\n",
60-
"dataset = CCDataset.from_web_latest()\n",
60+
"dataset = CCDataset.from_web()\n",
6161
"print(f\"The downloaded CCDataset contains {len(dataset)} certificates\")"
6262
]
6363
},
@@ -80,7 +80,9 @@
8080
"attachments": {},
8181
"cell_type": "markdown",
8282
"metadata": {},
83-
"source": "## 2. Turn the dataset into a [pandas](https://pandas.pydata.org/) dataframe -- a data structure suitable for further data analysis."
83+
"source": [
84+
"## 2. Turn the dataset into a [pandas](https://pandas.pydata.org/) dataframe -- a data structure suitable for further data analysis."
85+
]
8486
},
8587
{
8688
"cell_type": "code",
@@ -495,7 +497,7 @@
495497
}
496498
],
497499
"source": [
498-
"# Show arbitrary subset that we've defined earlier \n",
500+
"# Show arbitrary subset that we've defined earlier\n",
499501
"eal6_or_more.head()"
500502
]
501503
},

notebooks/examples/fips.ipynb

+6-4
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
"metadata": {},
3939
"outputs": [],
4040
"source": [
41-
"dset: FIPSDataset = FIPSDataset.from_web_latest()\n",
41+
"dset: FIPSDataset = FIPSDataset.from_web()\n",
4242
"print(len(dset))"
4343
]
4444
},
@@ -87,7 +87,9 @@
8787
{
8888
"cell_type": "markdown",
8989
"metadata": {},
90-
"source": "## Dissect a single certificate"
90+
"source": [
91+
"## Dissect a single certificate"
92+
]
9193
},
9294
{
9395
"cell_type": "code",
@@ -128,7 +130,7 @@
128130
"## Create new dataset and fully process it\n",
129131
"\n",
130132
"```{warning}\n",
131-
"It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web_latest()` or turning this into a Python script.\n",
133+
"It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web()` or turning this into a Python script.\n",
132134
"```"
133135
]
134136
},
@@ -147,8 +149,8 @@
147149
]
148150
},
149151
{
150-
"metadata": {},
151152
"cell_type": "markdown",
153+
"metadata": {},
152154
"source": [
153155
"## Advanced usage\n",
154156
"There are more notebooks available showcasing more advanced usage of the tool.\n",

0 commit comments

Comments
 (0)